Functional Gene Pipeline and Repository

[ Home | Display Options | Help | FunGenePipeline | RDP Home ]

If you use RDP's FunGene, please cite our most recent article.


Begin with these gene links:
Version 8.6 -- GenBank 214 (as of 6/22/2016)
Process your own Functional Gene data using our new FunGene Pipeline

Antibiotic resistances
genecontributor
ACTSyed Hashsham
BELSyed Hashsham
beta_IS6Robert Stedtfeld
beta_tnpARobert Stedtfeld
beta_tnpA2Robert Stedtfeld
bet_blaSHVRobert Stedtfeld
bet_tnpARobert Stedtfeld
CARBSyed Hashsham
cefa_qacEdeltaRobert Stedtfeld
chl_cmlARobert Stedtfeld
CMYSyed Hashsham
cprATamara Tsoi Cole
cprBTamara Tsoi Cole
CTX-MSyed Hashsham
dfra1Syed Hashsham
dfra12Syed Hashsham
FOXSyed Hashsham
gapATim Johnson
GESSyed Hashsham
IMISyed Hashsham
IMPSyed Hashsham
IncW_trwATim Johnson
IncW_trwBTim Johnson
INDSyed Hashsham
intICarlos Rodriguez-Minguela
intI1_sub1Tim Johnson
intI2Tim Johnson
intI3Tim Johnson
KPCSyed Hashsham
mdh_sub1Tim Johnson
mdh_sub2Tim Johnson
MIRSyed Hashsham
MOXSyed Hashsham
NDMSyed Hashsham
OXASyed Hashsham
pec_aad2Robert Stedtfeld
PERSyed Hashsham
repATim Johnson
Resfam_16S_Ribosomal_RNA_MethyltransferaseResfam
Resfam_AAC3Resfam
Resfam_AAC3-IaResfam
Resfam_AAC6-IaResfam
Resfam_AAC6-IbResfam
Resfam_AAC6-IIResfam
Resfam_ABCAntibioticEffluxPumpResfam
Resfam_adeA-adeIResfam
Resfam_adeBResfam
Resfam_adeC-adeK-oprMResfam
Resfam_adeRResfam
Resfam_adeSResfam
Resfam_ANTResfam
Resfam_ANT3Resfam
Resfam_ANT4Resfam
Resfam_ANT6Resfam
Resfam_ANT9Resfam
Resfam_APH3Resfam
Resfam_APH3_double_primeResfam
Resfam_APH3_primeResfam
Resfam_APH6Resfam
Resfam_ArmAResfam
Resfam_baeRResfam
Resfam_baeSResfam
Resfam_BCIIResfam
Resfam_BlaBResfam
Resfam_blaIResfam
Resfam_blaR1Resfam
Resfam_CARB-PSEResfam
Resfam_CepAResfam
Resfam_Cfr23RibosomalRNAMethyltransferaseResfam
Resfam_CfxAResfam
Resfam_Chloramphenicol_Acetyltransferase_CATResfam
Resfam_Chloramphenicol_Efflux_PumpResfam
Resfam_Chloramphenicol_Phosphotransferase_CPTResfam
Resfam_ClassAResfam
Resfam_ClassBResfam
Resfam_ClassC-AmpCResfam
Resfam_ClassDResfam
Resfam_CMY-LAT-MOX-ACT-MIR-FOXResfam
Resfam_CTXMResfam
Resfam_DHAResfam
Resfam_DIM-GIM-SIMResfam
Resfam_emrBResfam
Resfam_emrEResfam
Resfam_Erm23SRibosomalRNAMethyltransferaseResfam
Resfam_Erm38Resfam
Resfam_ErmAResfam
Resfam_ErmBResfam
Resfam_ErmCResfam
Resfam_FluoroquinoloneResistantDNATopoisomeraseResfam
Resfam_GESResfam
Resfam_GOBResfam
Resfam_IMPResfam
Resfam_INDResfam
Resfam_KHMResfam
Resfam_KPCResfam
Resfam_L1Resfam
Resfam_LRAResfam
Resfam_macAResfam
Resfam_macBResfam
Resfam_MacrolideGlycosyltransferResfam
Resfam_marAResfam
Resfam_mecR1Resfam
Resfam_MexAResfam
Resfam_MexCResfam
Resfam_MexEResfam
Resfam_MexHResfam
Resfam_MexW-MexIResfam
Resfam_MexXResfam
Resfam_MFSAntibioticEffluxPumpResfam
Resfam_mprFResfam
Resfam_msbAResfam
Resfam_NDM-CcrAResfam
Resfam_norAResfam
Resfam_PC1Resfam
Resfam_phoQResfam
Resfam_QuinoloneResistanceProteinQnrResfam
Resfam_ramAResfam
Resfam_RNDAntibioticEffluxPumpResfam
Resfam_robAResfam
Resfam_romAResfam
Resfam_SfhResfam
Resfam_SHV-LENResfam
Resfam_SMEResfam
Resfam_soxRResfam
Resfam_SPMResfam
Resfam_SubclassB1Resfam
Resfam_SubclassB2Resfam
Resfam_SubclassB3Resfam
Resfam_TE_inactivationResfam
Resfam_TEMResfam
Resfam_TetAResfam
Resfam_TetA-BResfam
Resfam_TetA-GResfam
Resfam_TetDResfam
Resfam_TetEResfam
Resfam_TetH-TetJResfam
Resfam_TetM-TetW-TetO-TetSResfam
Resfam_Tetracycline_Resistance_MFS_Efflux_PumpResfam
Resfam_Tetracycline_Resistance_Ribosomal_Protection_ProteinResfam
Resfam_TetXResfam
Resfam_TetYResfam
Resfam_tolCResfam
Resfam_vanAResfam
Resfam_vanBResfam
Resfam_vanCResfam
Resfam_vanDResfam
Resfam_vanHResfam
Resfam_vanRResfam
Resfam_vanSResfam
Resfam_vanTResfam
Resfam_vanWResfam
Resfam_vanXResfam
Resfam_vanYResfam
Resfam_vanZResfam
Resfam_VEB-PERResfam
Resfam_VIMResfam
SHVSyed Hashsham
SMESyed Hashsham
spec_aad1Robert Stedtfeld
strARobert Stedtfeld
strBRobert Stedtfeld
strept_aadRobert Stedtfeld
TEMSyed Hashsham
tet1Robert Stedtfeld
tet2Robert Stedtfeld
tet3Robert Stedtfeld
tet31Syed Hashsham
tet4Robert Stedtfeld
tetMCarlos Rodriguez-Minguela
tetQCarlos Rodriguez-Minguela
tet_sul2Robert Stedtfeld
tetWCarlos Rodriguez-Minguela
vanc_unnameRobert Stedtfeld
VEBSyed Hashsham
VIMSyed Hashsham
Plant Pathogenicity
genecontributor
avrEJames Kremer
txtARDP
txtBRDP
Biogeochemical cycles
genecontributor
amoA_AOAFeifei Liu
amoA_AOBRDP
bukRDP
butRDP
cbh1Cheryl Kuske
chbFan Yang
cooSFan Yang
cydARachel Morris
dsrAAlexander Loy/Michael Wagner
dsrBAlexander Loy/Michael Wagner
exc1Fan Yang
fixNRachel Morris
glxQichao Tu
hydAFan Yang
lcc_ascomycetesChris Wright
lcc_basidiomycetesChris Wright
ligERyan Penton
lipQichao Tu
mcrABlaz Stres
mmoXQichao Tu
mnpQichao Tu
nag3Fan Yang
napALaurent Philippot
narGLaurent Philippot
nifDRDP
nifHRDP
nirARDP
nirBRDP
nirKTracy Teal
nirSVeronica Gruntzig
norBGesche Braker
nosZBlaz Stres
nosZ_atypical_1Robert Sanford
nosZ_atypical_2Robert Sanford
nrfAJoel Klappenbach
nrfA_WelshAllana Welsh
nxrBRDP
phnXZarraz Lee
pmoATracy K. Teal
ppoQichao Tu
scd2RDP
soxBGupta Vadakattu
ureARDP
vp1Chris Wright
xylAQichao Tu
Phylogenetic markers
genecontributor
EF-TuJames Kremer
fusAScott Santos/Howard Ochman
gyrBZarraz May-Ping Lee
ileSScott Santos/Howard Ochman
lepAScott Santos/Howard Ochman
leuSScott Santos/Howard Ochman
pyrGScott Santos/Howard Ochman
recAScott Santos/Howard Ochman
recGScott Santos/Howard Ochman
rplBScott Santos/Howard Ochman
rpoBScott Santos/Howard Ochman
Biodegradation
genecontributor
alkbGerben Zylstra/Elyse Rodgers-Vieira
benAStephan Gantner
bphGerben Zylstra
bphA1Stephan Gantner
bphA2Stephan Gantner
BSHRobert Stedtfeld
carAShoko Iwai
cntARobert Stedtfeld
cutCRobert Stedtfeld
dbfA1Shoko Iwai
dxnAShoko Iwai
dxnA-dbfA1Tim Johnson
HSDHRobert Stedtfeld
npahGerben Zylstra
p450Gerben Zylstra/Elyse Rodgers-Vieira
ppahGerben Zylstra
PSARobert Stedtfeld
Metal Cycling
genecontributor
arsAPFAM
arsBPFAM
arsCPFAM
arsDPFAM
Other
genecontributor
acdSRDP
baiCDRDP Staff
cagASyed Hashsham
hcnAThierry Janssens
KS_alpha_PKSIIPatrick Hill
phlDThierry Janssens
phoDElizabeth Bent
phzAThierry Janssens
pltAThierry Janssens
pqsAThierry Janssens
prnDThierry Janssens
Spo0AJackson Sorenson
vacASyed Hashsham
Fungene News

06/03/2016  RDP staff on the road!
Teaching in China, Genomic Standards Consortium meeting in Crete, special ASM Microbe events in Boston

10/07/2015  Xander assembler article is published.
Xander: Employing a Novel Method for Efficient Gene-Targeted Metagenomic Assembly

10/07/2015  Warcup Fungal ITS article is accepted!
Fungal identification using a Bayesian Classifier and the 'Warcup' training set of Internal Transcribed Spacer sequences.

07/08/2015  *** Pyro Job Submission up ***
Hardware Issues causing pyro issues now fixed

05/28/2015  RDP Staff attending ASM Meeting in New Orleans
RDP staff will be attending the ASM General Meeting in New Orleans in the coming week. Two RDP posters will be presented: first on Tuesday morning:...

05/26/2015  RDP Release 11.4 available
Updated 16S rRNA hierarchy model to training set No. 14.

03/27/2015  FrameBot new option Add de novo to references available
Unique abundant query sequences will be added to the starting reference set if qualifications are met.

02/23/2015  WARNING -- RDP unavailable Sat., March 7th
Building network infrastructure upgrades planned 8 A.M. through 6 P.M.

02/16/2015  Introducing Xander assembler
RDP's new gene-target metagenomic assembler, Xander, is released

10/21/2014  Classifier provides gene copy number adjustment
RDP Classifier provides gene copy number adjustment for 16S gene sequences.

What is the Functional Gene Pipeline/Repository (FGPR)? and other FAQs

  • An interactive display of sequence search results for those interested in a particular gene family.
  • A tool to aid functional genomics studies, especially of the environment; updated monthly.

Where does the search result data come from?

  • FGPR searches are based on a protein model built from a set of different and well characterized "training sequences" submitted by experts.
  • The NCBI non-redundant protein database is searched using the models and the HMMER Hidden Markov Model (HMM) search program. This is the same program used to create the PFAM database of protein motifs.
  • Searches can be repeated using the same models when the protein database is updated.
  • Each gene is searched for common protein motifs using the PFAM database. Scores for these conserved motifs are included in the FGPR output. This can help separate unrelated "hits" that just happen to share a common protein motif with the gene of interest from related but highly diverged sequences.
  • For each "hit" the corresponding protein and nucleic acid records are retrieved. The protein "hits" are aligned using the HMM. Nucleic acid records are aligned by back-translating from the protein alignment. Source organism, reference information, etc. extracted from the records are linked into the FGPR output.

How do HMM searches compare to BLAST?

  • Since HMM models are based on a set of training sequences, they contain much more information than is conveyed by the single query sequence in BLAST. The training set helps define which regions are more conserved and what changes are most common.
  • It's been shown mathematically that the statistical test used in BLAST is essentially equivalent to a type of HMM search with a single training sequence.
  • BLAST is much faster than HMM model searches because it uses a heuristic to filter out sequences unlikely to match.

How do I use the FGPR? (try our new procedural tutorials or our early basic video tutorials*)

  • For each search, you're initially presented with a list of "hits" ordered by score. Starting "training sequences" are presented in color.
  • Jump to the bottom of the list to change the ordering or filter the results based on score, size, or source (environmental clone vs. isolated organisms). Hint: After you've set the filters and ordering to your preference, you can save the page as a "bookmark" in your browser.
  • The score filter is preset to exclude less meaningful results for searches where the total number of results is large. The excluded results can be displayed by changing the filter value.
  • You can choose to display only non-redundant protein hits, or to include redundant entries. (For example, NCBI sometimes considers a well-known training sequence to be a redundant entry if there's an identical protein sequence available.)
  • Protein or nucleic acid alignments can be downloaded for any subset of hits.
  • Analysis tools are being added. Current tools include a neighbor-joining phylogenetic tree builder and a primer/probe tester.

What are the columns in the FGPR display?

  • Select: A checkbox to select the "hit" for download or further analysis.
  • Score: (Bits saved) Score from the HMM search. Directly analogous to the (bits) Score in BLAST.
  • PID, NID: Protein and nucleic acid identifiers with links. NID links are only to the gene coding portion of the nucleic acid record. Some protein hits were not translated from the nucleic acid and do not have a corresponding NID.
  • Definition: From the NCBI protein record.
  • Organism: From the NCBI protein record.
  • Occ.: Occurrence, the number of HMM matches found in the protein. Should normally be 1. Any other number may indicate a false hit.
  • % of HMM Coverage: Percentage of the HMM model that matches the hit protein sequence.
  • % of HMM Identity: Percent identity of the protein sequence that matchs the HMM Model consensus sequence.
  • Size(aa): The length of the protein.
  • Reference: The first reference listed in the NCBI protein record. For those references abstracted by PubMed, a link is provided.
  • Motif(n): Hits are scored against PFAM-A HMMs to common protein motifs present in the gene of interest. Link to the corresponding PFAM records are given at the top of the table.
  • Notes and View/Edit: A place for members to add short notes about a particular "hit."

DOESRP logo linkNIH Human Microbiome Project


References and Support


J.A. Fish, B. Chai, Q. Wang, Y. Sun, C. T. Brown, J. M. Tiedje, and J. R. Cole. (2013). FunGene: the Functional Gene Pipeline and Repository. Front. Microbiol. 4: 291.

A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy. (2004). The Pfam Protein Families Database. Nucleic Acids Res. Database Issue 32: D138-D141.

D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler. (2004). GenBank: update. Nucleic Acids Res. Database issue 32: D23-D26.

R. Durbin, S. Eddy, A. Krogh, G. Mitchison. (1998). The theory behind profile HMMs. In: R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press.