find HELP in our video tutorials
Begin with these gene links:
Version 6.7 -- GenBank
(as of 11/24/2011)
Process your own Functional Gene data using our new FunGene Pipeline
Access archived gene files here
Access Fungene 1.0 here
|
Fungene News
02/10/2012 MSU Network Slowdown
12/01/2011 RDP featured in Science Watch, Dec. 2011
11/29/2011 Brief Service Interruption
06/16/2011 RDP User Jobs Slow Turnaround Update
05/27/2011 RDP Poster at BAGECO11
05/18/2011 RDP Staff attending ASM New Orleans, May 22-25
05/04/2011 RDP aids its users in making SRA submissions
05/04/2011 RDP's MIMARKS (former MIENS) GoogleSheets Ready!
04/07/2011 Latest MIMARKS (formerly MIENS) standards released
04/04/2011 RDP MIMARKS GoogleSheet |
Antibiotic resistances gene—contributor cprA—Tamara Tsoi Cole cprB—Tamara Tsoi Cole intI—Carlos Rodriguez-Minguela tetM—Carlos Rodriguez-Minguela tetQ—Carlos Rodriguez-Minguela tetW—Carlos Rodriguez-Minguela |
Biodegradation gene—contributor alkb—Gerben Zylstra/Elyse Rodgers-Vieira benA—Stephan Gantner bph—Gerben Zylstra bphA1—Stephan Gantner bphA2—Stephan Gantner carA—Shoko Iwai dbfA1—Shoko Iwai dxnA—Shoko Iwai dxnA-dbfA1—Tim Johnson glx—Qichao Tu lip—Qichao Tu mmoX—Qichao Tu mnp—Qichao Tu npah—Gerben Zylstra p450—Gerben Zylstra/Elyse Rodgers-Vieira ppah—Gerben Zylstra ppo—Qichao Tu xylA—Qichao Tu |
Biogeochemical cycles gene—contributor amoA—RDP buk—RDP but—RDP cooS—Fan Yang cydA—Rachel Morris dsrA—Alexander Loy/Michael Wagner dsrB—Alexander Loy/Michael Wagner fixN—Rachel Morris hydA—Fan Yang ligE—Ryan Penton mcrA—Blaz Stres napA—Laurent Philippot narG—Laurent Philippot nifD—RDP nifH—RDP nifH_tit—RDP nirA—RDP nirB—RDP nirK—Gesche Braker nirS—Veronica Gruntzig norB—Gesche Braker nosZ—Blaz Stres nrfA—Joel Klappenbach pmoA—Tracy K. Teal scd2—RDP ureA—RDP |
| Phylogenetic markers gene—contributor EF-Tu—James Kremer fusA—Scott Santos/Howard Ochman gyrB—Zarraz May-Ping Lee ileS—Scott Santos/Howard Ochman lepA—Scott Santos/Howard Ochman leuS—Scott Santos/Howard Ochman pyrG—Scott Santos/Howard Ochman recA—Scott Santos/Howard Ochman recG—Scott Santos/Howard Ochman rplB—Scott Santos/Howard Ochman rpoB—Scott Santos/Howard Ochman |
Plant Pathogenicity gene—contributor txtA—RDP txtB—RDP |
What is the Functional Gene Pipeline/Repository (FGPR)? and other FAQs
- An interactive display of sequence search results for those interested in a particular gene family.
- A tool to aid functional genomics studies, especially of the environment; updated monthly.
Where does the search result data come from?
- FGPR searches are based on a protein model built from a set of different and well characterized "training sequences" submitted by experts.
- The NCBI non-redundant protein database is searched using the models and the HMMER Hidden Markov Model (HMM) search program. This is the same program used to create the PFAM database of protein motifs.
- Searches can be repeated using the same models when the protein database is updated.
- Each gene is searched for common protein motifs using the PFAM database. Scores for these conserved motifs are included in the FGPR output. This can help separate unrelated "hits" that just happen to share a common protein motif with the gene of interest from related but highly diverged sequences.
- For each "hit" the corresponding protein and nucleic acid records are retrieved. The protein "hits" are aligned using the HMM. Nucleic acid records are aligned by back-translating from the protein alignment. Source organism, reference information, etc. extracted from the records are linked into the FGPR output.
How do HMM searches compare to BLAST?
- Since HMM models are based on a set of training sequences, they contain much more information than is conveyed by the single query sequence in BLAST. The training set helps define which regions are more conserved and what changes are most common.
- It's been shown mathematically that the statistical test used in BLAST is essentially equivalent to a type of HMM search with a single training sequence.
- BLAST is much faster than HMM model searches because it uses a heuristic to filter out sequences unlikely to match.
How do I use the FGPR? (try our video tutorials*)
- For each search, you're initially presented with a list of "hits" ordered by score. Starting "training sequences" are presented in color.
- Jump to the bottom of the list to change the ordering or filter the results based on score, size, or source (environmental clone vs. isolated organisms). Hint: After you've set the filters and ordering to your preference, you can save the page as a "bookmark" in your browser.
- The score filter is preset to exclude less meaningful results for searches where the total number of results is large. The excluded results can be displayed by changing the filter value.
- You can choose to display only non-redundant protein hits, or to include redundant entries. (For example, NCBI sometimes considers a well-known training sequence to be a redundant entry if there's an identical protein sequence available.)
- Protein or nucleic acid alignments can be downloaded for any subset of hits.
- Analysis tools are being added. Current tools include a neighbor-joining phylogenetic tree builder and a primer/probe tester.
What are the columns in the FGPR display?
- Select: A checkbox to select the "hit" for download or further analysis.
- Score: (Bits saved) Score from the HMM search. Directly analogous to the (bits) Score in BLAST.
- PID, NID: Protein and nucleic acid identifiers with links. NID links are only to the gene coding portion of the nucleic acid record. Some protein hits were not translated from the nucleic acid and do not have a corresponding NID.
- Definition: From the NCBI protein record.
- Organism: From the NCBI protein record.
- Occ.: Occurrence, the number of HMM matches found in the protein. Should normally be 1. Any other number may indicate a false hit.
- % of HMM Coverage: Percentage of the HMM model that matches the hit protein sequence.
- % of HMM Identity: Percent identity of the protein sequence that matchs the HMM Model consensus sequence.
- Size(aa): The length of the protein.
- Reference: The first reference listed in the NCBI protein record. For those references abstracted by PubMed, a link is provided.
- Motif(n): Hits are scored against PFAM-A HMMs to common protein motifs present in the gene of interest. Link to the corresponding PFAM records are given at the top of the table.
- Notes and View/Edit: A place for members to add short notes about a particular "hit."
References and Support
1R. Durbin, S. Eddy, A. Krogh, G. Mitchison. (1998) The theory behind profile HMMs. In: R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press.
2A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy. The Pfam Protein Families Database. Nucleic Acids Res. (2004) Database Issue 32:D138-D141.
3D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler. GenBank: update. Nucleic Acids Res. (2004) Database issue 1:D23-6.

