Molecular Taxonomy

Todd Disotell and Andrew Burrell

Todd Disotel
 Todd Disotell
 Andrew Burrell
 Andrew Burrell

Taxonomies have always been fluid and contentious, in part because of a paucity of data, and in part because of the inherently subjective nature of classification decisions.   Many biologists, including ourselves, firmly believe that taxonomy should be based upon evolutionary history (phylogeny) whenever possible.  This unfortunately may leave taxonomies relatively unstable until robust, well-corroborated, and generally accepted phylogenies exist for all primate groups.  However, biological reality is often quite complicated, and therefore taxonomy will sometimes be imperfect or inconsistent, even after a consensus has been formed about primate evolutionary history.


Over the last few decades molecular analyses have come to the forefront of phylogenetic research, and have been used to place populations into discrete categories as well to create and revise taxonomies.  For example, traditional taxonomies placed chimpanzees, gorillas, and orangutans into a ‘great ape’ family of Pongidae, and humans and our close relatives in a sister family, Hominidae.  However, this scheme was based on overall physical and behavioral resemblance, not evolutionary history.  Molecular analyses were among the first to strongly show that chimpanzees were, in fact, more closely related to humans than to other apes.  As a consequence, taxonomies have been modified and all great apes and humans are now included within the family Hominidae, whilst orangutans are now placed in the subfamily Ponginae, and Homo sapiens and our extinct relatives in the tribe Hominini.  


The revision of the great ape taxonomy is only one example.  Some other significant changes to primate taxonomy facilitated in part by molecular data include the placement of tarsiers with monkeys and apes (anthropoids) rather than with lemurs and lorises (strepsirrhines); the inclusion of the enigmatic aye-aye and microlemurs with the lemurs of Madagascar rather than with galagos and lorises; the recognition that lorises form geographic clusters (Africa and Asia) instead of morphological ones (slender and large); and the composition of the three major New World monkey groups, the pitheciids (pitheciines and Callicebus), the cebids (callithrichines, Aotus, Cebus, and Saimiri), and the atelids.  Molecular studies have also confirmed that that Asian odd-nosed colobines form a cluster distinct from African colobines; that mangabeys should be split into two distinct lineages, one allied with mandrills, the other with baboons and geladas; and that terrestrial guenons form a group separate from the arboreal guenons.  All of these new discoveries or confirmations of older hypotheses necessitate taxonomic revisions.


Inferring the evolutionary relationships among primates using genetic data ideally requires the use of multiple genes from multiple individuals in multiple populations.  Ultimately, we want to know the relationships among primate populations, known as the ‘species tree’.  In order to infer this, we can only use the histories of individual genes, or ‘gene trees’.  However, any individual gene may occasionally have a branching history different from that of the species.  This ‘gene-tree versus species-tree’ phenomenon results from  the differential extinction of polymorphic alleles, incomplete lineage sorting, hybridization, sex biased dispersal, and other essentially random factors.  Fortunately, a majority of genes will have histories that coincide with that of their species, so sampling many genes in multiple individuals will allow scientists to identify the likely ‘species tree’.


The data available to molecular primatologists come from several sources, including the nucleotide sequences of genes, transposable elements, single nucleotide polymorphisms (SNPs), and short tandem repeats (STRs, or microsatellites).  These data can now be collected from almost any biologically derived material, including hair, feces, and even saliva.  These biosources allow for the non-invasive collection of genetic material, greatly increasing our ability to sample large numbers of primates.  Museum specimens, bone, teeth, and skin can also yield usable DNA.  Unfortunately, well-meaning rules meant to protect species often end up hindering scientific efforts to study them by making it difficult to acquire and transport biomaterials.  


Sequence data and SNPs can come from genes in the maternally-inherited mitochondrial genome or the nuclear genome (the paternally-inherited Y chromosome, and biparentally-inherited X chromosome and autosomes).  Transposable elements and STRs come from the nuclear genome alone.  Sequence data consist of the order of nucleotides, both variable and invariant, at a specific location in the genome.  These gene sequences evolve in ways that are relatively well understood, and are most often used in the inference of the evolutionary relationships among primates.   Transposable elements are stretches of DNA that have the ability to duplicate and then insert themselves randomly into new areas of the nuclear genome.  Each insertion is a unique event, so if two primates share an insertion in the same place in their genomes, it is almost certain that they shared a common ancestor with that insertion.  This makes transposable elements powerful tools for inferring evolutionary relationships.  SNPs are sequence data, but involve the characterization of differences in single nucleotides known to be variable from many places in the genome, rather than differences in the order of nucleotides in a given gene.  Currently, they are most often used in studies of populations.  STRs are short stretches of nuclear DNA that have a short repeated nucleotide sequence motif.  This repeated motif often confuses the cellular machinery during DNA replication, and can result in the addition or subtraction of one or more repeats.  As a consequence, STRs evolve rapidly and, like SNPs, are useful in population-level studies.  STRs have proven particularly useful for parentage and relatedness studies, but less useful for phylogenetic inference.   


Primatologists are able take advantage of the fact that the various genetic systems (mitochondrial DNA, Y chromosome, X chromosome, and autosomes) have different patterns of inheritance and thus evolutionary signals.  Since mitochondrial DNA has as faster rate of evolution and is solely maternally inherited it is more useful for closely related taxa or those with female dispersal.   The paternally inherited Y chromosome will most accurately assess male specific history which may be useful for male dispersing species.  The X chromosome yields yet another signal since two copies are transmitted to females while only one to males.  The vast majority of potential markers come from the more slowly evolving biparentally inherited autosomal genome.  Using markers from a combination of these systems increases the likelihood of inferring the species tree.


These different types of molecular data can be used to investigate the evolutionary relationships among primates, the timing of lineage divergences, demographic history, and to get a sense of the genetic distance between populations, subspecies, and species.  As discussed previously, taxonomy should reflect evolutionary relationships, so developing a robust primate evolutionary tree is one contribution that molecular studies can make to primate taxonomy.  Such trees should be based on many genes from multiple representatives of each taxon, be they populations, subspecies or species, and ideally include mitochondrial, Y chromosome, and bi-parentally inherited molecular markers.


Another way in which molecular analyses can provide information for taxonomic decisions is through the use of relative levels of genetic divergence, allowing different taxa to be compared.  For instance, if two well-defined and accepted taxa diverge by N molecular units, then if two other taxa are at least N units divergent, they could be placed at the same taxonomic level.  The taxonomy of chimpanzee subspecies is a good example.  Based on mitochondrial DNA sequences, chimpanzees from Nigeria have been found to be more genetically distant from other western African chimpanzees (Pan troglodytes verus) than central (Pan troglodytes troglodytes) and eastern chimpanzees (Pan troglodytes schweinfurthii) were from each other.   The authors therefore proposed two possible taxonomic schemes. In the first, the Nigerian and western chimpanzees could be placed into a single subspecies (Pan troglodytes verus) and the central and eastern forms would then be placed into the single subspecies troglodytes.  In the second, troglodytes and schweinfurthii  could indeed be considered separate subspecies, and the Nigerian chimps would then be placed in their own subspecies, Pan troglodytes vellerosus (which has subsequently been modified to Pan troglodytes ellioti based upon the rules of the International Code of Zoological Nomenclature), separate from the other western African chimps (Pan troglodytes verus).  This issue currently remains unresolved; many more nuclear genes need to be surveyed to confirm that the Nigerian chimps are indeed distinct from other western African chimpanzees, and taxonomists need to decide whether eastern and central African chimpanzees deserve separation at the subspecies level.


Advances in genetic technologies will enable the characterization of large numbers of molecular markers in many individuals, giving primatologists a much better sampling of the genetic variation within populations and species.  As a result, there will most likely be a large number of revisions to primate taxonomy in the coming years.  Currently, most primate groups have been characterized using only a few molecular markers (often only mtDNA) and often with only a single individual representing a taxonomic group.  Since a single gene tree is not guaranteed to coincide with the species tree, taxonomies currently influenced by studies utilizing only mtDNA on a limited set of individuals may need revision. In particular, SNP-based technologies will become widely used to scan hundreds if not thousands of genes efficiently.  These more sophisticated molecular approaches will potentially aid in the identification of cryptic species that have not been diagnosed on the basis of morphology.  Furthermore, the use of multiple gene analyses will replace single gene ‘barcoding’ strategies currently being used in species identification.  


Unfortunately, the new molecular data may in some ways make primate taxonomy more complicated.  As more genes are surveyed, it seems likely that reticulation and other complex evolutionary processes will be detected.  Primate evolutionary history may not consist solely of a nested series of clean bifurcations, but something much messier and more interesting.  However, our taxonomic system does not handle ‘messiness’ very well.  One easy prediction to make about the impact of molecular analyses on taxonomy in the future is that we will recognize how complex evolution truly is and that defining discrete categories will often be inherently difficult.  However, this is not necessarily a bad thing.  Taxonomies need to balance stability and ease of understanding with an accurate depiction of biology.  The good thing about molecular markers is that they enable us to get much closer to inferring the one evolutionary history that must exist.



Todd R. Disotell


Andrew S. Burrell
Research Scientist

Center for the Study of Human Origins
Department of Anthropology
New York University
25 Waverly Place
New York, NY 10003 U.S.A


Todd Disotell received his Ph.D. from Harvard University in 1992 and has been at New York University ever since.  For over 20 years he has specialized in generating and testing hypotheses about primate phylogeny. While the majority of his research spans all of the groups of old world monkeys, he has also worked on apes, new world monkeys, and strepsirrhines. His research group has generated many of the complete mitochondrial genomes that figure prominently in debates about phylogeny and taxonomy. He has published 50 peer-reviewed scientific papers on population genetics, behavior, and conservation genetics and regularly contributes reviews and opinion pieces to various journals. He teaches, writes about, and lectures on human variation and race.

Andrew Burrell is a biologist and a biological anthropologist, having worked early in his career on projects relating to developmental biology and neuroendocrinology before getting his PhD in Anthropology in 2009.  His primary research focus is on the evolutionary genetics of baboons and mangabeys, but he is also involved in a wide range of other projects, from the conservation of African primates to the behavioral genetics of macaques.  This research has taken him to do field work in eastern and southern Africa.  He currently is a research scientist at New York University and lives in Brooklyn with his wife, Helen.


Citation: Noel Rowe, Marc Myers, eds. All the World’s Primates, Primate Conservation Inc., Charlestown RI.

Downloaded on:

Disclaimer: To make use of this information, please check our Terms of Use.