Friday, October 3, 2014

September 22 -25 2014: "SEQenvIII: From Signals to Environmentally Tagged Sequences III" Hackathon, HCMR, Crete, Creece

Participants: From left to right: Christina Pavloudi,
Anastasis Oulas, Lex Overmars, Conor Meehan, Lars Juhl
Jensen, Tomas Flouri, Tomas Vetrovsky, Chris Quince, Lucas
Sinclair. Not shown: Umer Ijaz (via Skype), Evangelos Pafilis
SEQenv, is a sister project of ENVIRONMENTS-EOL, addressing the microbial realm and bringing together the worlds of text mining, sequence analysis, statistics and vizualization through the prism of microbial ecology.

In particular, SEQenv is a pipeline aiming at annotating 16S rRNA and metagenomics microbial sequences based on environment descriptive terms.

Sequence similarity searches against public databases and the recognition of terms such as “glacier, pelagic, forest, lagoon” (i.e. Environment Ontology terms identified by the ENVIRONMENTS tagger) within Genbank records (e.g. “isolation source" field) and/or in the relevant literature (PubMed abstracts) are being employed to characterize novel microbial sequences. Subsequently, a range of visualizations, such as tag clouds, heatmaps, are generated to describe OTUs and samples.

Built incrementally, in three hackathons since September 2012, several features were added to SEQenv. e.g. starting from 16S rRNA sequences the pipeline now may be invoked either a. for DNA sequences or b. for protein sequences.

This year, it was time to clean up the code and package it in a language that would be easy to distribute. Moreover, speed performance was optimized and novel forms of visualization were explored. The core modules were rewritten in Python, speed-ups, e.g. by optimizing the sequence similarity searches were implemented, and interactive HTML/Javascript vizualizations started replacing R-generated static diagrams. In addition, integrating SEQenv-derived annotations with phylogenetic information and text-mining module extensions were investigated.

The effort and devotion of all participants (see image above) was the driving force overcoming the long-hours and any technical difficulties that arose. A big thank you from the organizers (without forgetting the SEQenvI and SEQenvII participants).

All three SEQenv hackathons have been funded by the EU COST ES 1103 Action on "Microbial ecology & the earth system: collaborating for insight and success with the new generation of sequencing tools". 

Besides the EU COST ES1103 Action the organizers would like to thank the LifeWatchGreece project for additional local support.
Any software produced during the hackathon will be made available as open source.