Functional Gene Pipeline and Repository

FunGenePipeline | FunGene | RDPipeline | RDP ]

FrameBot Help

| return to FrameBot |

Insertions and deletions cause frameshifts when translating DNA sequences to protein sequences -- RDP FrameBot detects and corrects these frameshift errors. Given a query DNA read and a set of known protein sequences, FrameBot compares each member of the protein target sequences to the query DNA sequence in both forward and reverse directions, and produces frameshift-corrected protein and DNA sequences and an optimal global-local protein pairwise alignment.

Citation: Wang, Q., J. F. Quensen III, J. A. Fish, T.-K. Lee, Y. Sun, J. M. Tiedje, J. R. Cole. 2013. Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool. mBio 4:e00592-13; doi: 10.1128/mBio.00592-13

You can adjust the length cutoff (after alignment) and the percent identity cutoff to filter out non-target reads. FrameBot has been tested and pre-configured for several important functional genes including nitrogenase reductase (nifH), butyryl-CoA transferase (but) and butyrate kinase (buk), dioxin/dibenzofuran dioxygenase (dxnA/dbfA1), dibenzofuran dioxygenase (dbfA2), carbazole dioxygenase (carA), cytochrome P-450 (p450), alkane hydroxylase B (alkb) and biphenyl dioxygenase (bphA).

We have provided a new option "Add de novo References" that may help with genes with high diversity or lack of closely related reference sequences in the reference set (such as biphenyl dioxygenase). The de novo mode strategy was designed by Michal Strejcek from Dr. Ondrej UhlĂ­k group at Institute of Chemical Technology Prague. This is based on the assumption that abundant sequences are more likely to be correct. The experimental sequences are dereplicated and sorted by abundance in descending order first. Each query is tested against the reference set. If a query doesn't have a close reference with above 70% aa identity, the corresponding protein sequence of the query will be added to the reference set if the following criteria are met:

This dereplication and sorting step is included as part the online FrameBot processing steps. If you run FrameBot from your own server, use this command to get the sorted sequences: java -jar /path/to/Clustering.jar derep with "--sorted" option.

If your gene is not in the drop-down list, you need to provide your own set of protein target sequences. FrameBot is computationally intensive. Since it does all-against-all comparison between query DNA and the target protein sequences, we recommend limiting the number of protein target sequences to 200.

frameshift correction

Example of the protein pair-wise alignment output with frameshift correction:

ppw align. output