gerdracor-coref

German Drama Corpus for Coreference
git clone git://git.janispagel.de/gerdracor-coref.git
Log | Files | Refs | README | LICENSE

commit 5acd002be92fa0fe7e89dc37c61c7e2269a1cf70
parent 0670114e09f882a198e3fc3abe1c1a4fcf2f33b8
Author: Janis Pagel <janis.pagel@ims.uni-stuttgart.de>
Date:   Mon,  7 Dec 2020 22:58:08 +0100

Fix typos

Also put every sentence on a separate line

Diffstat:
MREADME.md | 30++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md @@ -6,19 +6,27 @@ ## General Information -The GerDraCor-Coref (German Drama Corpus for Coreference) is a fork of the [GerDraCor](https://github.com/dracor-org/gerdracor) and contains coreference annotations for a subset of the GerDraCor texts. The texts are all German dramatic texts, written between 1730 and 1920. Annotated are all noun phrases, singletons were removed. Additionally, generic entities, abstract anaphora and amiguous mentions are also marked explicitely. In case of the latter two, only a part of the corpus has been annotated. +The GerDraCor-Coref (German Drama Corpus for Coreference) is a fork of the [GerDraCor](https://github.com/dracor-org/gerdracor) and contains coreference annotations for a subset of the GerDraCor texts. +The texts are all German dramatic texts, written between 1730 and 1920. +Annotated are all noun phrases; singletons were removed. +Additionally, generic entities, abstract anaphora and ambiguous mentions are also marked explicitly. +In case of the latter two, only a part of the corpus has been annotated. ### File Naming -The names of the files are composed of a short form of the title of the play and an appropiate file ending indicating the format, e.g. `Rosenkavalier.xmi`, `Rosenkavalier.xml`, `Rosenkavalier.conll` for "Der Rosenkavalier" by Hugo von Hofmannsthal. A full list of file names and their corresponding play is given in `plays.csv`. +The names of the files are composed of a short form of the title of the play and an appropriate file ending indicating the format, e.g. `Rosenkavalier.xmi`, `Rosenkavalier.xml`, `Rosenkavalier.conll` for "Der Rosenkavalier" by Hugo von Hofmannsthal. +A full list of file names and their corresponding play is given in `plays.csv`. ### Partial Annotations -Some texts have not been fully annotated, but only one or more acts. The act(s) annotated are indicated in the filename, e.g. `Manuscript_Act5.xmi`. If the full text was annotated, no special marker is applied, e.g. `Sara.xmi`. +Some texts have not been fully annotated, but only one or more acts. +The act(s) annotated are indicated in the filename, e.g. `Manuscript_Act5.xmi`. +If the full text was annotated, no special marker is applied, e.g. `Sara.xmi`. ### Parallel Annotations -In order to make Inter-Annotator agreement studies possible, we carried out parallel annotations of single acts, annotated by distinct annotators. These annotations are located in separate branches and the annotator and act is additionally indicated in the filename, e.g. `Sara_AS_Act1`. `gold` annotations are not specially marked in the filename. (ToDo) +In order to make Inter-Annotator agreement studies possible, we carried out parallel annotations of single acts, annotated by distinct annotators. +These annotations are located in separate branches and the annotator and act is additionally indicated in the filename, e.g. `Sara_AS_Act1`. `gold` annotations are not specially marked in the filename. (ToDo) ### Encoding @@ -33,11 +41,14 @@ We provide several formats to represent the coreference annotations: - CoNLL 2012 - DIRNDL -For the texts that have not been fully annotated, we additionally provide TEI output only for the parts that have been annotated. The CoNLL output always only contains the annotated parts. The XMI output always contains the full text. +For the texts that have not been fully annotated, we additionally provide TEI output only for the parts that have been annotated. +The CoNLL output always only contains the annotated parts. +The XMI output always contains the full text. ### XMI -As the XMI files can become quite large, they have been compressed using `gzip`. Uncompress them by entering a command line and run +As the XMI files can become quite large, they have been compressed using `gzip`. +Uncompress them by entering a command line and run ```sh $ gzip -d <FILENAME>.xmi.gz @@ -50,7 +61,9 @@ DIRNDL is a file format based on the CoNLL format, but additionally also contain ## Organization The annotations are sorted into folders according to the different output formats. -Parallel annotations by different annotators are organized into branches in the git tree. The main annotations are located in the `gold` branch. Partial annotations are sorted under the main folder in a subfolder called `part`. +Parallel annotations by different annotators are organized into branches in the git tree. (ToDo) +The main annotations are located in the `gold` branch. +Partial annotations are sorted under the main folder in a subfolder called `part`. ### Folder structure @@ -96,4 +109,5 @@ Like [GerDraCor](https://github.com/dracor-org/gerdracor), GerDraCor-Coref is re ## Contribution -We appreciate contributions regarding extensions, bug fixes and the like. Please feel free to create issues or pull requests. +We appreciate contributions regarding extensions, bug fixes and the like. +Please feel free to create issues or pull requests.