Friday, March 7, 2014

November 2013 - February 2014: EOL/Traitbank Integration, On-going Benchmark, Exploratory Visualizations

Unlike the silence of this blog, progress has occurred in all fronts of the ENVIRONMENTS-EOL project. Many sub-parts of which are now either finished, or nearing completion.

Have you seen ENVIRONMENTS-EOL predictions in the Encyclopedia of Life?

Early in 2014 the Encyclopedia of Life (EOL) released its new version, along with Traitbank, its novel data search facility. Among other innovative biodiversity research features, Jennifer Hammock, Patrick Leary, Katja Schulz, Cyndy Parr (and anyone else I may be omitting) have incorporated ENVIRONMENTS-EOL predictions into the EOL system.

As shown on the left Environment Ontology (ENVO) descriptive terms associated with a taxon (in this example: Hexanchus griseus, the Bluntnose Six-gill Shark), can be seen both in the Overview, Quick Facts (short list, upper-right part of the figure), and under the Data Tab (extended list, lower-right part of the figure).

Such features render the ENVIRONMENTS-EOL predictions accessible for all EOL users indifferent of the Information Technology skills.

Due to natural language intricacies, such as the multiple meanings a word may have, erroneous predictions will occur. As described in previous posts (April, July 2013), ENVIRONMENTS-EOL has been developed in iterative cycles aiming to identify and handle the most prominent of such errors.

An improved version of the ENVIRONMENTS tagger is now ready and the release of a new ENVIRONMENTS-EOL annotation dataset is in preparation.

Named Entity Recognition (NER) Evaluation
In a text-mining point-of-view, the work on the ENVIRONMENTS-600 corpus has been concluded; the Inter Annotator Agreement having been the last step of the process.

The tagger evaluation in terms of precision and recall is nearly complete. Points of particular interest were:
  • the handling of multiple EnvO identifiers having been mapped to a term by the curators, and/or being returned by the tagger
  • the hierarchical relationship of curated/predicted terms according to EnvO
  • the NER evaluation for distinct EnvO sub-graphs only (e.g. only for environmental features, or habitats, or environmental materials)
  • the NER evaluation for the different EOL Species page sections (e.g. only for "Habitat", or "Distribution", or "Taxon Biology")
The analysis of the NER performance is on-going.

North, South America bird habitat associations and vizualisations
Early in February 2014, the NESCent EOL-BHL Research Sprint (Durham, North Carolina) event gave ENVIRONMENTS-EOL a unique opportunity to explore concrete biological questions based on its machinery.

Interdisciplinary collaboration was the at very center of the event; Biologists were teamed with Information Technology reseachers to tackle open biodiversity research questions based on EOL/TraitBank, and Biodiversity Heritage Library (BHL) data.

NoPlaceLikeHome, a project initiated and driven by Prof. Rob Stevenson, U Mass, Boston, aimed at exploring species - habitat associations. Significant contribution was received by the local collaborator Dr. Carl Nordman, NatureServe.

In this context ENVIRONMENTS was used to annotate in-house North and South America Bird (Aves) information such as ecology, habitat, migration descriptions and others. 

Heatmaps and tagclouds (see below) were generated to visualise the text mining results: species - EnvO term associations based on simple term counts.

The visualisation scripts have been developed as part of the SEQenv sister project to characterize microbial sample sequences, according to the environment from which they derive. 

Employed in a higher eukaryote context, the same tools can still support knowledge exploration e.g. by highlighting rare/frequent habitats, species habitat breadth, and intra-taxon environment association differences.

An even user-friendlier, interactive version of the graphicsis under way in collaboration with Dr. Umer Ijaz, Uni. of Glasgow

The image shown below is a compilation of the project outputs by Cyndy Parr, as found in the EOL Blog (among other reports from the NESCent EOL-BHL Research Sprint and EOL News). Quoting Cyndy's to-the-point legend: "Species are on the X axis and the Environment Ontology (EnvO) habitat term associations on the Y axis, with the redness (or size in the inset Wordle) based on simple term counts."  

No comments:

Post a Comment