SALVE FOUNDATION HOME  
BIOHOME - COURSES - LIBRARY - SCIENCE NEWS - GALLERY - LINKS - MAIL  
Phylogenetic trees and evolutionary comparison  
Course outline: Molecular trees    
 
   
     
   

This page discusses basic concepts and methods of phylogenetic tree reconstruction based on biological sequences. You can use this page to test what you already know, and also as a summary for preparing for your exam. The Background section discusses some basic concepts with some outside links. The Course section contains material of the course itself, with links to pages.

 

Background

Systematics

Classification of organisms:

  1. Individuals identified as members of species
  2. Species grouped by similarity and/or common descent
  3. Order of descent established (phylogenetic tree)
  4. Identifying ancestors
  5. Using phylogenetic information for further studies (mechanism of evolution, conservation, comparison, function of traits etc)

 

Phenetics

Phenetics is the study of relationships based on similarity among a group of organisms. Similarity or distance can be quantified. Species are groupped according to similarity or distance mesures. Phenetic relationships are expressed by a tree-like network called a phenogram.

 

Cladistics

Cladistics is the study of descent of organisms. Descent is estimated by studying certain characters of common origin, modified through evolution (shared derived characteristics). Degree of relatedness of organisms and evolutionary pathways among organisms are quantified by estimating the time needed for changes in character since divergence of two taxa. Ancestor-descendant relationships is expressed by a tree-like network called cladogram.

More on cladistics at Berkeley's journey into the world of phylogenetic systematics http://www.ucmp.berkeley.edu/clad/clad4.html.

 

Cladistics vs. Phenetics

The two phylosophycal approaches often result in the same tree. When similarity directly reflects descent, a phenogram is similar to a cladogram. To put it simple, perhaps too simple for many, a cladogram reflects similarity weighed by the time since divergence.

 

Classical and Molecular Systematics

Classical systematics consideres visible characters of organisms to reconstruct phylogeny. Such characters might be the anatomy, physiology or behaviour of organisms. However, most of such characters are modified by natural selection, under varying selection pressure. Changes of such selected characters might speed up or slow down during certain periods, therefore time since divergence cannot be precisely estimated in most cases.

Molecular systematics compares biological sequences (protein, nucleotide) of organisms. Some biosequences, and many sites in most sequences change regardless of selection pressure. Thus similarity among sequences might provide a less biased information on branching order. However, there are a number of complications when inferring phylogenetic species trees from biosequences.

 

Phylogenetic trees

Order and time of divergence is represented by phylogenetic trees. Some phylogenetic trees resemble real trees with trunk and branches, others, look like pathways. The two figures represent the same four species. The tree with trunk (or root) on Fig. A represents the branching order among four species, indicating that the lineage to Cow split first from the lineage leading to Dog, Human and Monkey. However, trees do not come automatically with root, trees must be rooted by speciel techniques. Simply looking at an unrooted tree (Fig. B) there is no way to tell, where the root might be fitted in.

 
(A) Rooted tree
 
(B) Unrooted tree

 

(C) Rooted tree

Nodes represent taxonomic units. External nodes represent units directly compared (eg. extant species), while internal nodes are ancestral or hypothesized units.
Units might be species, subspecies, order, in fact, any kind of taxonomic unit (OTU: Operational Taxonomic Unit).
Branches define the relationship among the OTUs.
Branch length reflects differences among OTUs in terms of time since divergence, percent of differences etc. Please note, that many trees are unscaled, and branch length in unscaled trees are arbitrary.
Clade is a group of all OTUs having a common ancestor.
Root: the branch leading to the common ancestor of all OTUs of the tree. Please note, that most trees do not have a root, even if they seem to.
Topology is the branching pattern of the tree.

It should be noted, that topology does not necessarily represent true evolutionary history. When comparison is made according to similarity of sequences, like in classical phenetics, what we get is a hierarchical order of similarity among OTUs. Two taxa might be similar not only because they have a common ancestor in the near past, but also by similar selection pressure, or simply by chance.

 

Complications of DNA trees

Bifurcating pattern.

Speciation is not a short event but a long lasting process as illustrated by Fig A. Although taxa A, B, C, and D are clearly separated by time t4, divergence has no precise point of time. In case of taxa A, B and C even the order of divergence cannot be estimated precisely and Figure B might might be a true reperesentation of divergence. However, most phylogenetic analytical tools work only on bifurcating trees, therefore a short additional branch is sometimes added to the tree as in Fig C.

 

 

Convergent versus paralell evolution

Similar traits might evolve independently under similar selection pressure in different lineages. Owls are related to nightjars (lower right corner of owl), and condors to storks (lower right corner of condor) according to DNA-DNA hybridization studies. However, beak shape of owls and condors are more similar to those of unrelated birds of prey, than to their closest relatives.

Convergent evolution might obscure distincton between homologies and analogies by classical systematics. Although less often, similarity in biosequences might also be a result of convergent evolution.

Considere the case of lysozyme, an enzyme expressed in saliva, tears, milk or egg yolk in most higher vertebrates. Lyzozyme defends the organism against infection by degrading the walls of bacteria. In some vertebrates, the ruminants, coliben monkeys and the Hoatzin bird, lyzozyme is also expressed in the stomach. Breaking the walls of bacteria in the stomach is important to free nutrients assimilated by bacteria. The form of lyzozyme expressed in saliva, however, cannot function in the highly acidic stomach fluid. Amino acid changes in critical sites of the protein resulted in lyzozyme active in the gut. Such changes of the sequence happened in similar direction, independently in the 3 lineages. Thus similarity of the gut lyzozyme sequence of langur, cow and Hoatzin reflects convergent evolution and not common descent. Phylogenetic trees based on convergent sequence evolution therefore might be misleading. Nevertheless, compared to visible traits, such as the shape of beaks, there are only a handful examples for convergent evolution of biosequences.
For more information start with Zhang and Kumar (link from PubMed to free article here).

 

Orthologs versus paralogs

Evolution is a process of modification of old goodies. In case of highly successful inventions (for example, transmembran receptors), duplication and modification of domains or complete genes is widespread. For example, dopamine receptors D2 and D4 in human and mouse in figure above seem to be result of duplication back in time.

Human and mouse dopamine receptors
Human D2DR
Mouse D2DR
Human D4DR
Mouse D4DR
Protein sequence
Tail of molecule enlarged

Considere gene A in the ancestor species in figure bellow. Following duplication and modification, A1 and A2 variants of gene A was fixated in the ancestor. The ancestor species diverged into species X and Y. The two variants A1 and A2 evolves independently in the two lineages into A1X - A2x, and A1Z - A2Z in species X and Z, respectively.

Paralogous genes are derived from duplication, such as A1 and A2.
Orthologous genes are derived from speciation, such as A1x - A1z, and A2x - A2z.

 

Genetic similarity among taxa should be estimated by comparing orthologous sequences.
Going back to human and mouse dopamine receptors, which pairs seem to be orthologs and paralogs?

 

Xenologs

Species are considered groups of organisms not transferring or receiving genes from other species. Species to species gene exchange (horizontal gene transfer), however, is an important evolutionary process in bacteria and viruses. In eucharyotes, virogene transmisson of genes can happen among lines. Genes transferred horizontally are called xenologs. However, horizontal gene transfer in eucharyotes is probably so rare that it would not affect the structure of most DNA trees.
See articles on horizontal gene transfer written by Dr. Syvanen here.

 

Gene trees versus species trees.

Speciation is a process rather than a short event, and there is no clear cut definiton when two groups of related organisms can be considered as subpopulations, ecotypes, subspecies or species. Gene divergence is also a long lasting process in which mutation continuously increases polimorphism, while selection and random drift eliminate some of the alleles.

Phylogenetic trees based on biosequence comparisons represent time and order of divergence of the sequences (gene tree). Sequence divergence does not neccessarily coincides with species divergence (species tree).

 

.

Sequence divergence would normally preceed species divergence in regions under selection for polimorphism (Fig A). For example, some genes involved in the immune response are highly polimorphic in humans, whixh is good for our health. However, polimorphism was also good, that is positively selected for, in the common ancestor of humans and chimps resulting in high polimorphism. During speciation, some of these variants were held by human and chimp populations, and the two species might share those alleles even today. Some of the allelic variants were transferred only to the human, and some only to the chimp populations. Humans and chimps are therefore different in such alleles, however, divergence started long before speciation.

In some genes divergence starts after speciation. Such genes would therefore underestimate time since speciation (Fig. B). It is important to use a number of genes to construct phyogenetic trees.


Course

How to get sequences (right)

For some studies genes should be sequenced by the researcher. However, once a study is published, the sequence would be uploaded into some public data base, such as the GenBank. Sequences in data bank can be searched for their similarity (BLAST), by their names (ENTREZ), their source (taxonomy), or other attributes.

DataBank pages

Alignment

As stated earlier, sequences to be compared should be homologous, that is, they should be descendents of a common ancestor gene of a common ancestor species. Now we should add, that sites in the sequences should also be homologous. Identification of homologous sites is not trivial. Some of the bases might be lost, some bases might be inserted, and some might be substituted by others. Homologous sites therefore should be identified by a process called alignment before building a tree.

Alignment pages

Tree building algorithms

There are a good number of methods for constructing phylogenetic trees from biosequnces. Basically there are three major approaches: distance, parsimony, and likelihood methods.

Distance methods considere overall similarity of the sequences. For example, number of base differences in all sites are counted pairwise between four species, A, B, C, and D. We can write those 7 figures or distances into a matrix:

Operational taxonomic units
OTU
A
B
C
B
C
D

The basic idea is, that sequences very different

According to Steel and Penny (pdf)

 

, and ther are numerous ways to classify them.

molecular data (Nei and Kumar 2000). They can be classified into

Review Article Parsimony, Likelihood, and the Role of Models in Molecular Phylogenetics Mike Steel2,* and David Penny *Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand; and Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any "model." On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods—for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underling models of sequence evolution in order to successfully reconstruct evolutionary trees.

http://www.molbiolevol.org/cgi/content/abstract/17/6/839

      1) DNA tree reconstructon                  
        Introduction to the problem                
        Databanks                
        Search with words                
            PubMed Practice: Neandertal          
            Entrez GenBank            
        DNA alignmnent                
            Gap penalty            
                Alignment by dynamic programing        
            Local alignments            
        BLAST                
        Multiple alignment                
            Clustal            
        Phylogenetic trees                
                         
                         
                         

 

 

 

     
Page written by: Anikó Schrott and Peter Kabai  
Edited by: Peter Kabai  
modif:.2001-06-10
     
written: 2001-05-04, modified.: 2001-05-04