Genome curation on emerging model species

About

The objectives of the project are to bring together software engineers and biologists to design and implement essential software for answering questions in evolutionary genomics, and to offer researchers the training material to be able to efficiently conduct these studies using both existent and newly developed tools. We proposed to work toward two final products:

  1. building novel or enhancing existing software that culminates in a single platform for genome analysis
  2. constructing a genomics curation curriculum for researchers of any career stage.

Genome Train

The Genome Train seeks to educate and empower researchers with an understanding of the process of genome sequencing. We present the process of automated annotation and manual curation of a newly sequenced genome, both in ideal conditions and in the current state of affairs.

We outline best practices, such as describing some of the tools, the methods, and the players working behind the scenes, converting data into testable hypotheses (and not—as it may be often assumed—facts).

Throughout this process, we wish to lead biologists themselves to undertake an aspect of quality control that only they, as domain experts who understand the biology of the species, can perform.

The analogy of a train journey allows shows best practice guidelines and checklists as station stops that force us to evaluate our position before investing further effort. As your research community undertakes its own long journey, we hope that this journey planner will help you reach your destination: a high quality community resource that ensures no evolutionary question is beyond the reach of a long-term research program.

The work here described is part of the published work (reference) that resulted from the working group on “Building non-model species genome curation communities”, funded by NESCent.

  1. Experimental Design
  2. Assembly
  3. Structural Annotation
  4. Feature Curation
  5. Functional Assignment

The following tools were recently created by our group:

  • Just Annotate My Genome (JAMg): Genome structural feature annotation tool - for identifying genes.
  • Just Annotate My Proteins (JAMp): Genome feature annotation tool - for inferring GO annotations.
  • GenSAS: Integrates gene prediction and genome visualization tools into a single visual interface.
  • GeneValidator: automated or visual identification of problematic gene predictions.
  • Afra: crowdsourcing curation of gene predictions in the classroom.