|
How can DNA sequences be obtained?
Nucleotide sequences can be searched in GenBank by similarity
to a certain sequence (BLAST ),
or by keywords (Entrez ).
Blast search is discussed elsewhere, here we focus on Entrez
at NCBI. Basic options of Entrez has been discussed in relation
to searches in PubMed .
GenBank can be searched for sequences with:
- English or scientific name of any taxa (species, family,
order, etc.)
- gene or protein name
- GenBank accession number
- molecular weight
- and more
To see how this works, click on GenBank,
and a new window will be opened. If you are interested in
a specific sequence of a given species, for example, you need
16S ribosomal RNA gene of Vombatus ursinus, simply type in
"Vombatus ursinus AND 16S" and hit "GO".
(Boolean operators, "AND", "OR",
"NOT" and substitutive characters, *,? may be used.)
Result yielded by search is a list, which in this case consists
of a single entry:
1: AF102811
Vombatus ursinus 16S ribosomal RNA gene, mitochondrial gene
for mitochondrial RNA, complete sequence gi|4324396|gb|AF102811.1|AF102811[4324396]
Description starts with the name of the species, than we
have the name of the sequence, type of the sequence, and an
indication, whether it is a complete or partial sequence.
On the right you can go on to PubMed to see the article (if
any), where study related to the sequence was published. You
have access also to "Related sequences" and to database
"Popset", which consists of sequences submitted
together to describe a population or some evolutionary event.
Finally, you can check the taxonomy of your species.
Now to get to the actual sequence click at the accession
number of your sequence (locally here,
or on-line in your GenBank window.
What you get is much more than the sequence:
| LOCUS |
GenBank
identifier |
length |
type
of nucleic acid |
GenBank
section * |
date |
| DEFINITION |
information about the sequence, source organism, gene
name, gene type (e.g. cds = coding sequence = exon) |
| ACCESSION |
a
unique identifying number |
| SOURCE |
organism ( and its taxonomy) and type of tissue where
the sequence is from |
| REFERENCE |
publication
data |
| FEATURE |
detailed
information about the various region of the gene, amino
acid sequence |
| BASE COUNT |
in
how many copies the four bases occur in this sequence |
| ORIGIN |
the
sequence |
* - there are 16 sections, some of them refer
to the source organism (e.g. MAM = Mammalia, PRI = Primata,
PLN = plant, etc.), in others there are various products of
the sequencing procedure (e.g. STS = short sequences, which
occur in only one copy in the given organism, EST = expressed
sequence tag, STS which is obtained from partially sequenced
cDNA, SYN = syntethic sequence, etc.)
When searching for sequences in real life, you might face
some difficulties:
- there aren't any data of the species you need
- you have got the species, but not the very sequence you
need.
- you have the right list of species and the right type
of sequences, however, the sequences do not overlap. For
example you got the first half of the sequence of species
A, and the last half of the sequence for species B.
- you got everything right, still, be cautious, because
about 5% of the genes in the GenBank have some error
- genes we have found evolve too quickly or too slowly to
specify the phylogeny of the given taxa
Searching for sequences to build a tree is like cooking.
There are some recepies, still, you need practice and luck.
Here are some hints to help you, and then you can follow a
complete practice example.
- check PubMed for published trees you can use
- check Taxonomy for the species, taxa you need
- check PopSet for aligned sequences for some of the species
you need
- when a relevant sequence is found, use that to make a
BLAST search
Next:
Practice example: Phylogeny for some Galliforme birds
|
|
|