mcClust Complete Linkage Clustering
This tool performs complete linkage clustering of one or more aligned sequence files. The distance is calculated using straight percent identity (does not take ambiguity codons in to account), ignoring positions where either or both sequences have a gap. Sequences must overlap by at least 25 bases or else distance calculation will fail. This tool will dereplicate sequences first, calculate the sequence sequence distances, and then perform complete linkage clustering with the given step size to the provided cutoff. By default each separate file uploaded will be treated as a different sample. If you would like your samples to be arranged differently, you may upload a sample mapping file (see the dereplicator section for more information on the sample mapping file format).
The uploaded sequence file is expected to have a sequence #=GC_RF that specifies which positions are to be compared. Sequences obtained from the aligner will include this sequence automatically.
The open source, command-line version of this tool is available as part of our RDP Tools package from github.com/rdpstaff for local installation. Valuable help info is available in the tool README files.