Visualisation Tutorial


This page aims to be a gentle walkthrough into visualising the SAM and BAM files output by next generation aligners such as BWA or Bowtie, particularly with regard to the new features introduced here. Genometa is based on IGB and we encourage you to look at the very nice screencasts and documentation there. However, we have put about four man years of development time into modifying IGB for a metagenomic scenario and establishing its' reliability on both artificial and published datasets.

A presentation of the concepts behind Genometa also exists.

These tutorials will also showcase the new elements added by the team at Hannover University of Applied Sciences, namely:
  • integration of the bowtie aligner to allow analysis of metagenomic short read datasets
  • a large collection of well-formatted reference sequences from more than five sequencing initiatives
  • SAM/BAM conversion functionality
  • extension of tooltips
  • a dynamic clickable bar graph summary function
  • more pleasant default visualisation
  • text export of metagenomics output

Starting Genometa

Go to the directory where you unzipped Genometa into.

Open Genometa by clicking on the
run_igb.bat (Windows)
or
run_igb.sh (Linux)

The various versions (1GB,2GB,3GB,5GB) start Genometa with 1 or more gigabytes of memory. We recommend using as much memory as your computer has available.
For example, for 32bit Windows try 1GB, 32 bit Linux try 2GB, and if you have a 64 bit system with more RAM try using the 3 or 5 GB versions.

If Genometa does not start it may not have been able to create a Java virtual machine of that size with the memory available - try a smaller version.


Loading a file:

After starting Genometa, the locations of the "Metatie-Fastalines" file and the "Lineage Mappings" file should be set in the preferences.
Select File > Preferences.
Open the tab "Data Sources".
Set the locations of the two files. They can be found in the "data" subdirectory of the directory to which genometa was unzipped into.



After setting the preferences, the file "GlacierIceMetagenome_SRX000607_filt-bwa.sam" can be opened with the "Open file..." dialogue in the "File" menu.
Select the file "GlacierIceMetagenome_SRX000607_filt-bwa.sam", which can be found in a subdirectory called "example" of the folder where Genometa was unzipped to. Upon clicking the "Open" button Genometa will transform the SAM file to a BAM file with index and open this.

This dataset was published in 2009 by Simon et al. in the paper "Phylogenetic diversity and metabolic potential revealed in a glacier ice metagenome", Appl Environ Microbiol. 75, 7519-7526.


Refresh

While the data has been loaded, it has not yet been displayed. This is because sequence alignments can be very large and IGB/Genometa only load the reads currently within the window being viewed. After zooming out by moving the slider at the top of the screen to the left hand side, click on the "Refresh Data" button at the top right to visualise the data. This should look like the following image.



Should the left hand panel remain blank, we suggest loading a feature file of type GFF. A feature file is present in the "example" subdirectory. This should cause the display to be filled up with metagenomic data.

Metagenomics:

On the left side there is a "Metagenomics" tab that shows various data about the reads matching bacteria in the reference sequences. This pane can be resized by dragging the border on the right hand side to expand the display and be able to read the names and numbers.

These data can also be exported as CSV file. This can be performed under "Tools" -> "Export Data as CSV". The resulting CSV file then can be imported into other tools such as R, Open Office or Excel to visualize the data.



Overview Histogram - Summary histogram of all Species

This feature helps the user of Genometa to overview the number of attributed reads to each bacterial taxon. The data is presented as a bar chart, where each bar represents a certain species. The height of the bars reflects the number of reads. Furthermore the bars contain other information on the species such as genus, species, strain, read number and species-id. The figure below shows the appearance of the bar chart.




To display data in the chart, the user should load a BAM or SAM file. After the loading is finished, the bar chart prepares the data and then displays them. It is also possible to load another file, so that Genometa will merge them and the bar chart reloads the data directly. If the user switches the data source, the bar chart will reload the new data directly and keep the previous data in cache, so they can reloaded quickly.

The sliders above from and left of the chart are zoom-buttons. The upper one zooms horizontally and the left one vertically. In the figure the upper zoom-button is moved to the left, which is why the bars are expanded and the text in the bars is shown. The scroll-bars are usable after zooming in a direction. The zoom center in vertical direction is always in bottom of the bars instead of the horizontal direction. The zoom center is here where the user have clicked on the chart (not within bars). A vertical hairline should appear there to show the user where the zoom center is. The hairline is by default at the vertical axsis.

The user can interact with the graph. If they click on any bar, this bar will be marked and Genometa jumps to the genome view of the species in the Sequence View Map. Species truly present in the metagenome are likely to have large portions of their genome covered by reads, while contaminated reads or reference sequences might mean reads are attributed only to very restricted genomic loci. For example, we have experienced reads mapping to a heat-stable ligase from Thermus thermophilus, which we believe reflects laboratory contamination of that particular metagenome. It is also possible to select a species in the left table. The related bar will be marked. A marked bar is highlighted dark blue.



Tooltips:

While hovering over read glyphs genometa displays information about the read (or annotation, should a feature file have been loaded) in form of a tooltip overlay (see figure 1). The contents of this tooltip can be changed by using the tooltip editor for the corresponding glyph type (BAM for read glyphs, GFF for annotation glyphs) by opening the Preferences Window (File->Preferences) and switching to the corresponding tab (see element 1 in the figure below). GFF feature files can be obtained from the NCBI genome website or NCBI RefSeq ftp site, and loaded using the File > Open dialogue.


In the following example the BAM tab is used to visualize the different control widgets, the GFF editor only differs in the available item tags.

Editing Tooltips

The current display state of the tooltip is visualized in the tooltip preview (see element 2). All item tags that are displayed in this field are visualized by the tooltip, in the same order they appear here. Item tags can be added to the list by choosing an available item tag from the combo box and clicking "Add item:" (see element 3). Furthermore a blank link can be added to the tooltip by clicking "Add Blank Line" for easier reading of tag groups.

For changing the order of tags a tag can be selected from the tooltip preview window and moved by clicking either "Move up" or "Move down" (see element 4). Clicking "Remove item" will remove the selected item or blank line from the tooltip.

"Reset to defaults" will reset all changes made to the tooltip and restore the initial state of the editor (see element 5)



Global Settings

The Global tooltip settings panel houses settings that affect all tooltips regardless of glyph type: The "Max. Length" Slider (see element 6) allows setting the width (number of characters) of the generated tooltip window. By default all item tag values that exceed this number will be truncated. Increase this value to accomodate for very long tooltip strings.

"Enable tooltips" globally enables or disables the display of tooltips (element 7), while "Show all available tags" (element 8) will generate tooltips showing all available tag items regardless of the tags chosen in the appropriate editor preview.











KFG    |    FHH