SALVE FOUNDATION HOME  
BIOHOME - COURSES - LIBRARY - SCIENCE NEWS - GALLERY - LINKS - MAIL  
Phylogenetic trees and evolutionary comparison  
Page: Finding nucleotide sequences in GenBank by Entrez    
 
  .
.
 
     
   

 

How can DNA sequences be obtained?

Nucleotide sequences can be searched in GenBank by similarity to a certain sequence (BLAST ), or by keywords (Entrez ). Blast search is discussed elsewhere, here we focus on Entrez at NCBI. Basic options of Entrez has been discussed in relation to searches in PubMed .

GenBank can be searched for sequences with:

  • English or scientific name of any taxa (species, family, order, etc.)
  • gene or protein name
  • GenBank accession number
  • molecular weight
  • and more

To see how this works, click on GenBank, and a new window will be opened. If you are interested in a specific sequence of a given species, for example, you need 16S ribosomal RNA gene of Vombatus ursinus, simply type in "Vombatus ursinus AND 16S" and hit "GO". (Boolean operators, "AND", "OR", "NOT" and substitutive characters, *,? may be used.)

Result yielded by search is a list, which in this case consists of a single entry:

1: AF102811
Vombatus ursinus 16S ribosomal RNA gene, mitochondrial gene for mitochondrial RNA, complete sequence gi|4324396|gb|AF102811.1|AF102811[4324396]

Description starts with the name of the species, than we have the name of the sequence, type of the sequence, and an indication, whether it is a complete or partial sequence.

On the right you can go on to PubMed to see the article (if any), where study related to the sequence was published. You have access also to "Related sequences" and to database "Popset", which consists of sequences submitted together to describe a population or some evolutionary event. Finally, you can check the taxonomy of your species.

Now to get to the actual sequence click at the accession number of your sequence (locally here, or on-line in your GenBank window.

What you get is much more than the sequence:

LOCUS GenBank identifier length type of nucleic acid GenBank section * date
DEFINITION information about the sequence, source organism, gene name, gene type (e.g. cds = coding sequence = exon)
ACCESSION a unique identifying number
SOURCE organism ( and its taxonomy) and type of tissue where the sequence is from
REFERENCE publication data
FEATURE detailed information about the various region of the gene, amino acid sequence
BASE COUNT in how many copies the four bases occur in this sequence
ORIGIN the sequence

* - there are 16 sections, some of them refer to the source organism (e.g. MAM = Mammalia, PRI = Primata, PLN = plant, etc.), in others there are various products of the sequencing procedure (e.g. STS = short sequences, which occur in only one copy in the given organism, EST = expressed sequence tag, STS which is obtained from partially sequenced cDNA, SYN = syntethic sequence, etc.)

When searching for sequences in real life, you might face some difficulties:

  • there aren't any data of the species you need
  • you have got the species, but not the very sequence you need.
  • you have the right list of species and the right type of sequences, however, the sequences do not overlap. For example you got the first half of the sequence of species A, and the last half of the sequence for species B.
  • you got everything right, still, be cautious, because about 5% of the genes in the GenBank have some error
  • genes we have found evolve too quickly or too slowly to specify the phylogeny of the given taxa

Searching for sequences to build a tree is like cooking. There are some recepies, still, you need practice and luck. Here are some hints to help you, and then you can follow a complete practice example.

  • check PubMed for published trees you can use
  • check Taxonomy for the species, taxa you need
  • check PopSet for aligned sequences for some of the species you need
  • when a relevant sequence is found, use that to make a BLAST search

Next:

Practice example: Phylogeny for some Galliforme birds

 

 

   
     
     
     
     
     
     
Page written by: Anikó Schrott and Peter Kabai  
Edited by: Peter Kabai  
     
written: 2001-05-04, modified.: 2001-05-04