FrameBot Help| return to FrameBot |
Insertions and deletions cause frameshifts when translating DNA sequences to protein sequences -- RDP FrameBot detects and corrects these frameshift errors. Given a query DNA read and a set of known protein sequences, FrameBot compares each member of the protein target sequences to the query DNA sequence in both forward and reverse directions, and produces frameshift-corrected protein and DNA sequences and an optimal global-local protein pairwise alignment.
Citation: Wang, Q., J. F. Quensen III, J. A. Fish, T.-K. Lee, Y. Sun, J. M. Tiedje, J. R. Cole. 2013. Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool. mBio 4:e00592-13; doi: 10.1128/mBio.00592-13
- Extends a dynamic programming algorithm proposed by Guan et al., 1996. Alignments of DNA and protein sequences containing frameshift errors. Comput. Appl. Biosci. 12:31-40
- Requires a set of reference protein sequences
- Checks both forward and reverse directions of the query DNA
- Produces an optimal alignment between the query DNA and the target protein sequences in the presence of frameshifts
- Returns the frameshift-corrected protein and DNA query sequences
- Reports the protein pairwise alignment with the best score
You can adjust the length cutoff (after alignment) and the percent identity cutoff to filter out non-target reads. FrameBot has been tested and pre-configured for several important functional genes including nitrogenase reductase (nifH), butyryl-CoA transferase (but) and butyrate kinase (buk), dioxin/dibenzofuran dioxygenase (dxnA/dbfA1), dibenzofuran dioxygenase (dbfA2), carbazole dioxygenase (carA), cytochrome P-450 (p450), alkane hydroxylase B (alkb) and biphenyl dioxygenase (bphA).
We have provided a new option "Add de novo References" that may help with genes with high diversity or lack of closely related reference sequences in the reference set (such as biphenyl dioxygenase). The de novo mode strategy was designed by Michal Strejcek from Dr. Ondrej Uhlík group at Institute of Chemical Technology Prague. This is based on the assumption that abundant sequences are more likely to be correct. The experimental sequences are dereplicated and sorted by abundance in descending order first. Each query is tested against the reference set. If a query doesn't have a close reference with above 70% aa identity, the corresponding protein sequence of the query will be added to the reference set if the following criteria are met:
- Length Cutoff and Identity Cutoff.
- The abundance is above certain cutoff, default is 10
- No frameshifts or stop codon present.
If your gene is not in the drop-down list, you need to provide your own set of protein target sequences. FrameBot is computationally intensive. Since it does all-against-all comparison between query DNA and the target protein sequences, we recommend limiting the number of protein target sequences to 200.
Example of the protein pair-wise alignment output with frameshift correction: