janispagel.de

Source code for https://janispagel.de
Log | Files | Refs | README | LICENSE

commit 528394f9cdfb4bc070d02714fe698ac8c08ba38f
parent 156b59d1779c4a219080d58c8c5ca18a1e7fd3f6
Author: Janis Pagel <janis.pagel@ims.uni-stuttgart.de>
Date:   Thu,  6 Dec 2018 12:41:46 +0100

Add abstracts

Diffstat:
Adata/abstracts/dhar2016a.txt | 1+
Adata/abstracts/krautter2018a.txt | 1+
Adata/abstracts/pagel2018a.txt | 1+
Adata/abstracts/pagel2018b.txt | 2++
Adata/abstracts/pagel2018c.txt | 1+
Adata/abstracts/reiter2018a.txt | 1+
Mpublications.html | 14++++++++++----
7 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/data/abstracts/dhar2016a.txt b/data/abstracts/dhar2016a.txt @@ -0,0 +1 @@ +We present a fully functional Information Retrieval system for 10,000 Amazon reviews. Two different types of systems were developed to evaluate the effectiveness of the retrieval systems: Vector space and probabilistic model systems. The effectiveness of the systems are evaluated for various metrics. While the probabilistic model systems performed better in the initial stages, the TF-IDF system from the vector space family of systems performed better as a whole. Future enhancements being considered are looked upon. diff --git a/data/abstracts/krautter2018a.txt b/data/abstracts/krautter2018a.txt @@ -0,0 +1 @@ +In den Literaturwissenschaften koexistieren verschiedene Perspektiven auf Protagonisten, Helden oder Hauptfiguren dramatischer Texte, die unterschiedliche Definitionen und Identifikationsstrategien der Figuren veranschlagen. Grundsätzlich lassen sich die meisten dieser Definitionen in ein Set computerlesbarer Figureneigenschaften übersetzen. Figuren, die diesen Merkmalen entsprechen, können dann etwa als Protagonisten des Dramas klassifiziert und von anderen Figuren (wie etwa Nebenfiguren) unterschieden werden. Eine solche Klassifikationsaufgabe ist das zentrale Anliegen des Beitrags. Ein Teilproblem stellt dabei die Erkennung von titelgebenden Figuren dar, die zwar mit der Protagonistenklassifikation verwandt ist, aber eigene Voraussetzungen mit sich bringt. Wir nähern uns beiden Aufgaben zunächst theoretisch und schlagen eine eigene Protagonistendefinition vor, die sich zum Zweck einer automatischen Klassifikation operationalisieren lässt, die aber dennoch bestehende literaturwissenschaftliche Forschung aufgreift und an deren Definitionen anschließt. Ein manueller Annotationsversuch zeigt gleichwohl, dass Definitionen dieser Art nur begrenzt intersubjektivierbar sind. Mithilfe von verschiedenen Features wie Tokenzahl von Figuren, Topic Modeling und Netzwerkmaßen trainieren wir anschließend einen Random Forest Classifier, der Figuren automatisiert in Protagonisten und Nicht-Protagonisten, bzw. Titelfiguren und Nicht-Titelfiguren aufteilt. Die Ergebnisse zeigen, dass Protagonisten und Titelfiguren aufgrund ihrer meist herausgehobenen Stellung im Drama tatsächlich sehr sicher mit einfachen Features zu erkennen sind. Eine abschließende Analyse der Klassifikation einzelner Figuren am Beispiel von Die Verschwörung des Fiesko zu Genua, Maria Stuart und Emilia Galotti schließt an die literaturwissenschaftliche Perspektive an und macht deutlich, dass Machine Learning Modelle interessante Ausgangspunkte für tiefergehende Überlegungen zu Protagonisten und Titelfiguren bereithalten. diff --git a/data/abstracts/pagel2018a.txt b/data/abstracts/pagel2018a.txt @@ -0,0 +1 @@ +Bridging resolution is the task of recognising bridging anaphors and linking them to their antecedents. While there is some work on bridging resolution for English, there is only little work for German. We present two datasets which contain bridging annotations, namely DIRNDL and GRAIN, and compare the performance of a rule-based system with a simple baseline approach on these two corpora. The performance for full bridging resolution ranges between an F1 score of 13.6% for DIRNDL and 11.8% for GRAIN. An analysis using oracle lists suggests that the system could, to a certain extent, benefit from ranking and re-ranking antecedent candidates. Furthermore, we investigate the importance of single features and show that the features used in our work seem promising for future bridging resolution approaches. diff --git a/data/abstracts/pagel2018b.txt b/data/abstracts/pagel2018b.txt @@ -0,0 +1,2 @@ +The phenomenon of bridging describes types of non-coreferential entities, which stand in a prototypical or inferable relationship to a previously introduced discourse entity. The machine-aided resolution of such bridging relations tries to detect bridging anaphors and automatically link these anaphors to their antecedents. Research on automatic bridging resolution is rare and resources for training algorithms on the problem of bridging resolution are as well. This thesis therefore introduces new data for bridging resolution in German, the GRAIN corpus, and evaluates the data with regard to the goodness of annotation quality and occurring types of bridging. To ensure the generalizability of the approach, the established corpus DIRNDL is additionally used. In order to determine the difficulty of the task for the present data, an informed baseline is implemented and evaluated. Furthermore, a rule-based system based on Hou et al. (2014) is created in order to perform bridging resolution. To determine the possibilities of using learning-based models for resolving bridging relations, a gradient boosting model is trained on the same data as the rule-based system. The rule-based system performs better than the baseline and achieves an F1-Score of 5.3% for DIRNDL and 4.0% for GRAIN. An analysis with oracle lists for the rule-based system shows that many rules do not have any access to the correct antecedent. The gradient boosting model is able to outperform the rule-based system for DIRNDL (F1 = 11.3%), but is not able to generalize on GRAIN. The differences can be explained by looking at the different structure of the corpora and their topic distribution. Furthermore, the results of the gradient boosting model suggest that more training data would greatly improve learning-based approaches for bridging resolution. + diff --git a/data/abstracts/pagel2018c.txt b/data/abstracts/pagel2018c.txt @@ -0,0 +1 @@ +In computational linguistics (CL), annotation is used with the goal of compiling data as the basis for machine learning approaches and automation. At the same time, in the Humanities scholars use annotation in the form of note-taking while reading texts. We claim that with the development of Digital Humanities (DH), annotation has become a method that can be utilized as a means to support interpretation and develop theories. In this paper, we show how these different annotation goals can be modeled in a unified workflow. We reflect on the components of this workflow and give examples for how annotation can contribute additional value in the context of DH projects. diff --git a/data/abstracts/reiter2018a.txt b/data/abstracts/reiter2018a.txt @@ -0,0 +1 @@ +In this paper, we aim at identifying protagonists in plays automatically. To this end, we train a classifier using various features and investigate the importance of each feature. A challenging aspect here is that the number of spoken words for a character is a very strong baseline. We can show, however, that a) the stage presence of characters and b) topics used in their speech can help to detect protagonists even above the baseline. diff --git a/publications.html b/publications.html @@ -34,13 +34,14 @@ <h2>2018</h2> - <p class="tab" id="Reiter2018b"><span class="author">Benjamin Krautter, Janis Pagel, Nils Reiter, Marcus Willand</span>. + <p class="tab" id="Krautter2018a"><span class="author">Benjamin Krautter, Janis Pagel, Nils Reiter, Marcus Willand</span>. <strong><span class="title">Titelhelden und Protagonisten - Interpretierbare Figurenklassifikation in deutschsprachigen Dramen</span></strong>. <i><span class="journal">LitLab Pamphlets</span></i>, <span class="volume">vol. 7</span>, <span class="month">November</span> <span class="year">2018</span>.<br /> <a title="Titelhelden und Protagonisten - Interpretierbare Figurenklassifikation in deutschsprachigen Dramen" href="https://www.digitalhumanitiescooperation.de/wp-content/uploads/2018/12/p07_krautter_et_al.pdf">[paper]</a> + <a title="Abstract" href="data/abstracts/krautter2018a.txt">[abstract]</a> </p> <p class="tab" id="Pagel2018c"><span class="author">Janis Pagel, Nils Reiter, Ina Rösiger, Sarah Schulz</span>. @@ -51,7 +52,8 @@ pp. <span class="pages">31-36</span>, <span class="month">August</span> <span class="year">2018</span>.<br /> - <a title="A Unified Text Annotation Workflow for Diverse Goals" href="http://ceur-ws.org/Vol-2155/pagel.pdf">[abstract]</a> + <a title="A Unified Text Annotation Workflow for Diverse Goals" href="http://ceur-ws.org/Vol-2155/pagel.pdf">[paper]</a> + <a title="Abstract" href="data/abstracts/pagel2018c.txt">[abstract]</a> </p> <p class="tab" id="Pagel2018a"><span class="author">Janis Pagel, Ina Rösiger</span>. @@ -63,6 +65,7 @@ <span class="month">June</span> <span class="year">2018</span>.<br /> <a title="Towards Bridging Resolution in German: Data Analysis and Rule-based Experiments" href="http://aclweb.org/anthology/W18-0706">[paper]</a> + <a title="Abstract" href="data/abstracts/pagel2018a.txt">[abstract]</a> <a title="Supplementary Material" href="http://www.ims.uni-stuttgart.de/institut/mitarbeiter/roesigia/bridging-resolution-german-supplementary.pdf">[supp]</a> <a title="DIRNDL" href="http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/dirndl.en.html">[data1]</a> <a title="GRAIN" href="http://hdl.handle.net/11022/1007-0000-0007-C632-1">[data2]</a> @@ -77,17 +80,19 @@ <span class="month">December</span> <span class="year">2018</span>.<br /> <a title="Detecting Protagonists in German Plays around 1800 as a Classification Task" href="https://eadh2018.exordo.com/files/papers/65/final_draft/article.pdf">[paper]</a> + <a title="Abstract" href="data/abstracts/reiter2018a.txt">[abstract]</a> </p> <h2>2016</h2> - <p class="tab" id="Pagel2016a"><span class="author">Prajit Dhar, Janis Pagel</span>. + <p class="tab" id="Dhar2016a"><span class="author">Prajit Dhar, Janis Pagel</span>. <strong><span class="title">An Information Retrieval System for Product Reviews</span></strong>. In <span class="booktitle">Proceedings of the 60th StuTS</span>, <span class="address">Heidelberg, Germany</span>, <span class="year">in preparation</span>.<br /> <a title="An Information Retrieval System for Product Reviews" href="/data/papers/information-retrieval-system.pdf">[preprint]</a> + <a title="Abstract" href="data/abstracts/dhar2016a.txt">[abstract]</a> <a title="Source Code" href="https://github.com/pagelj/airs">[code]</a> </p> @@ -99,6 +104,7 @@ <span class="year">2018</span>, <span class="notes">unpublished</span>.<br/> <a title="Rule-based and Learning-based Approaches for Automatic Bridging Detection and Resolution in German" href="/data/papers/masters-thesis.pdf">[thesis]</a> + <a title="Abstract" href="data/abstracts/pagel2018b.txt">[abstract]</a> <a title="DIRNDL" href="http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/dirndl.en.html">[data1]</a> <a title="GRAIN" href="http://hdl.handle.net/11022/1007-0000-0007-C632-1">[data2]</a> <a title="Source Code" href="https://github.com/pagelj/pub-2018-bridging-resolution-german">[code]</a> @@ -120,7 +126,7 @@ <div id="footer"> <span class="right"> -<div id="date">Last update: 2018-11-15</div> +<div id="date">Last update: 2018-12-06</div> </span> </div> </body>