The Environments-EOL project is nearing its
main stages (corpus creation, tagger bench-marking, EOL annotation and taxa
characterization, to take place in Summer 2013). To this end a range of
preparatory tasks are being/have been conducted.
May 2013 has seen a “dry-run” curation being
setup. A small set of document is being used for a trial curation (ongoing).
The manual and lengthy nature of a corpus generation dictates tests take place
before the main body of work commences. Via such a “dry-run” curators are
getting familiarized with the Environment Ontology as well as with relevant
browsing and searching tools. Additionally, questions are being raised and
discussions invoked on the exact context of terms to be annotated by the Environments-EOL
project.
In parallel: early tests showed that the manual
addition of synonyms in the dictionary (see “Dictionary Generation in previous
post”) could improve the tagger performance. To facilitate such task
specialized EOL sections (e.g. Habitat)
have been analyzed (counting word frequency in non-tagged text segments). A priority list of terms to be considered was
derived. After manual inspection environment related words have been mapped to
EnvO terms and can now be added in the dictionary. The EOL records involved in
this training step have been excluded from the corpus generation (and
subsequently the software evaluation).
Last but not least: Environments-EOL is a project
tightly bound to the Environment Ontology community resource. Highlighting this
connection as well the projects’ dynamic nature: a thank you for the EnvO team’s
prompt and timely response in updating the "terrestrial biome"
hierarchy, comprising now more compact and fine grained terms (see EnvO News Post)