How can DNA sequences be obtained?
For some questions sequencing a certain gene cannot be avoided.
The good news is, that once a gene is sequenced, it normally
would be uploaded to a public data base. In fact, most journals
require authors to deposit their sequences prior excepting
their manuscripts for publication. This way anyone reading
the article can check the original data, moreover, such sequences
can be used for other type of analyses.
The three major nucleotide databases are available online:
(maintained by National Center for Biotechnology Institution),
(European Molecular Biology Laboratory) and
(DNA Data Bank of Japan)
Each database contain the same data because they regularly
exchange them while you sleep. Data banks obtain sequences
directly from research laboratories, companies, scientific
literature and from patents; and they are continuously updated.
You can not only download sequences free, but also exploit
many other useful tools and information in the database. In
this course we shell visit GenBank, as an example of databases.
A short visit to NCBI
Most databanks actually consist of several interconnected
databases. Creating such interconnected databases as well
as building efficient search software is quite complicated.
Using them, however, is not too difficult.
The starting page of NCBI (National Center for Biotechnology
Institution) gives you several options. The options are not
arranged in a logical, but rather in a practical way. In the
deep blue navigation bar, links to search options and data
bases are mixed up, to confuse the enemy. However, as any
search you do will lead to a specific homepage with enlarged
navigation bar, at this point we would discuss only the search
Basically there are two search option, BLAST and Entrez.
The BLAST retrieval system can be used to find sequences
according to similarity. If you do have a nucleotide or amino
acid sequence you can use BLAST to search and retrieve all
sequences similar to yours. Similarity, of course, can be
defined in many ways, which we will discuss later. The only
thing you should remember of BLAST at this point is, that
if you need a similarity search, hit BLAST in the navigation
All the other data sets are searched with words,
by the so called ENTREZ system. Word search is simple.
For example, if you are interested to find out whether Neandertal
man were among your ancestors, you simply type in "Neandertal"
into the SEARCH bar and hit GO. The Default search option
is GenBank, so you get back references of sequences which
in some field contain the word "Neandertal".
At the present time, 4 sequences are related to Neandertal,
that is to the species Homo sapiens neanderthalensis. Before
we go on searching for sequences, we shell spend some time
with some details of Entrez search.
PRACTICE: Now, please, open
homepage, and make a search in GenBank (default window)
for NEANDERTAL. Results will be shown in the "Entrez
Nucleotide" window. Now, type neandertal with different
spelling "neandertHal". At this point I got but
a single sequence. Unfortunately, spelling might be critical
when searching with words. When in doubt, it is useful to
truncate the word by an asterisk, such as neandert*. I got
While word search is easy, results can be quite frustrating
Fortunately, there are some ways to focus your search. Possibilities
are different in the various databases, so we show them separately.