Publication Type: | Journal Article |
Year of Publication: | 2004 |
Authors: | R. C. Edgar |
Journal: | Nucleic acids research |
Volume: | 32 |
Pagination: | 1792–7 |
ISSN: | 1362-4962 |
Keywords: | Algorithms, Amino Acid Motifs, Amino Acid Sequence, Internet, Molecular Sequence Data, Protein, Protein: methods, Reproducibility of Results, Sequence Alignment, Sequence Alignment: methods, Sequence Analysis, Software, Time Factors |
Abstract: | We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle. |
URL: | http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=390337&tool=pmcentrez&rendertype=abstract |
DOI: | 10.1093/nar/gkh340 |