Genometa - Rapid analysis of metagenomic short reads


Genometa is a Java based local bioinformatics program which allows rapid analysis of metagenomic short read datasets. Millions of short reads can be accurately analysed within minutes and visualised in the browser component. A large database of diverse bacteria and archaea has been constructed as a reference sequence.

Our approach is based upon the established open source visualisation tool IGB and supported by the rapid alignment program bowtie. We also make use of the excellent open source Picard toolset for SAM files. Genometa is thus also an open source project with full source code available from the Downloads page.

The new elements added by our team at the Hannover University of Applied Sciences include:
  • integration of the bowtie aligner to allow analysis of metagenomic short read datasets
  • a large collection of well-formatted reference sequences from five sequencing initiatives
  • SAM/BAM conversion functionality
  • extension of tooltips
  • a dynamic clickable bar graph summary function
  • attractive default visualisation
  • text export of metagenomics output

See the visualisation tutorial (Tutorial 1) and alignment tutorial (Tutorial 2) for further details of the new features.

A variety of reference datasets are available against which reads can be aligned are available in the downloads section and comprise between 2550 genomes (full dataset),  748 genomes (one strain selected per genus) and 1551 genomes (one strain selected per species) from the:
2012 NCBI RefSeq collection
European Metahit project
Human Microbiome project
Moore microbial genome sequencing project
Genomic Encyclopedia of Bacteria and Archaea (GEBA) project



Motivation

While other in silico frameworks are available for analysis of metagenomic datasets, few can cope with the millions of reads generated by Illumina and Solid sequencers. Server farms are certainly not available to all groups, and sequence information is now ubiquitous. Assignment speed is thus a challenge which we aim to solve, particularly as rapidly generated short read data may one day be used in routine monitoring and diagnosis of bacterial infections. Also, few current programs provide high assignment accuracy, so many metagenomic studies to date have only reported phylum, family or genus relative abundance. This limitation reduces the utility of sequence information, since highly pathogenic and harmless species may be present within a single genus. Our approach, Genometa, uses the high throughput of next generation alignment programs in tandem with an extensively curated reference dataset, various metadata and standardised output formats to rapidly output specific results to microbes at species level.


Contact

Dr. Colin Davenport, davenport (dot) colin (at) mh-hannover (dot) de

























KFG    |    FHH