Defined Community Analysis
This tool compares input nucleotide reads to the set of known sequences for amplification targets in the sequenced DNA. It determines the numbers and types of errors present in the reads. It may also help determine appropriate quality filters for the dataset from the same sequencing run. The defined community reads and the reference sequences should cover the same region of the gene. If not, you can trim reference sequences to the amplicon region by using the Initial Processing with corresponding forward and reverse primers.
The output contains five result files:
- a text file (*_pairwise.aln) contains the pairwise alignment between each read and its closest reference sequence.
- a tab-delimited file (*_mismatch.txt) containing each mismatch error in the following format: read ID, closest reference sequence ID, mismatch position in the alignment, the expected base, observed base, position in the read, position in the reference and Q score of the base.
- a tab-delimited file (*_indel.txt) containing each insertion and deletion error in the following format: read ID, closest reference sequence ID, indel position in the alignment, expected homopolymer length, observed homopolymer length, indel base, indel position in the read, indel position in the reference and Q score of the base if it’s a insertion.
- a tab-delimited file (*_qual.txt) containing the read Q score for each sequence if quality scores are provided.
- a summary file (*_error_summary.txt) including total mismatches and indels, the number of reads per target reference, and Q score, and errors summarized by type, reference and Q score. The summary file can be imported into excel spreadsheet to calculate error rates and make plots.