Course Glossary
From glossary of Trends in
Bioinformatics
"From Glossary of Text Book " "Bioinformatics"
Some of the definitions below are based on the bioinformatics
and molecular genetics glossary available at the
U.S. Department of Energy Primer on Molecular Genetics.
Word/Phrase> Definition
A | B
| C | D | E
| F | G | H
| I | J | K
| L | M
N | O
| P | Q | R
| S | T | U
| V | W | X
| Y | Z
algorithm > any sequence of actions that performs a particular task; fixed procedure embodied in a computer program
annotation > A functional description of a clone, which may include identifying attributes such as locus name, keywords, and Medline references.
BAC > Bacterial Artificial Chromosome; see cloning vector.
BLAST > The Basic Local Alignment Search Tool is a fast technique for detecting ungapped subsequences that match a given query sequence.
BLIMPS > The BLocks IMProved Searcher is a search tool used to compare a DNA sequence against protein patterns in the Blocks database.
Blocks database > A public database of protein patterns that correspond to the most highly conserved regions in proteins.
base pair (bp) > Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double-helix by the bonds between base pairs. The human genome contains an estimated 3 billion base pairs (bp). One million bp is often referred to as 1 Mb and one thousand as 1 kb.
bootstrapping > statistical method to estimate reproducibility of phylogenetic trees
browser > program used to access Web Server pages (on Intra- or Inter-Net)
client > computer or software on a computer that interacts with remote computer
command line > non-GUI interaction with computer (usually UNIX)
content > length of genomic DNA with a particular function (I.e. exon)
cDNA > Complementary DNA; synthesized from a mRNA template.
CentiMorgans (cM) > The measurement between markers on a genetic map. Two markers are said to be 1 cM apart if they are separated by recombination 1% of the time, roughly equal to a distance of 1 million bp.
cloning vector > A DNA molecule originating from a virus, plasmid, cosmid, phage, bacteria, or yeast into which a foreign DNA fragment is integrated and then introduce into host cells, where it can be reproduced in large quantities (cloned).
cluster > A group of clones related to one another by sequence homology. Each cluster has a unique Cluster ID number for a given stringency.
codon > A sequence of three DNA bases within a gene that codes for a single amino acid.
Cosmid > Artificially constructed cloning vector containing the cos gene of phage lambda which is used to infect E. coli, permitting cloning of DNA fragments up to 45kb, larger than those possible using plasmid vectors.
domain > portion of a protein that folds independent of the rest of protein
DUST > Program for filtering low-complexity regions of DNA structure
domain name > classifies and identifies host machines in terms of Internet organization
download > transfer file from remote host to local machine
DNA > Deoxyribonucleic acid, the double-stranded molecule held together by weak bonds between base pairs of 4 different nucleotides. Encodes genetic information.
E-
EBI > European BioInformatics Institute (EMBL Outstation)
EMBL > European Molecular Biology Laboratory
EMBnet > European Molecular Biology network
EST > Expressed Sequence Tag; a sampling of sequence from a cDNA. snap-shot of expression in particular tissue at particular time
Entrez > An online resource provided by the National Center for Biotechnology Information (NCBI). It organizes GenBank sequences and links them to original literature.
exons > The protein-coding sequences of genes. Exons only comprise about 10% of the human genome. See introns.
email > rapidly composed & electronically transferred messages on computer
F-
filtering > replacing repeats with X (proteins) or N (DNA); avoid spurious scores
fold > overall folding pattern of a 3-D protein or RNA structure
ftp >
FAQ > computer file of frequently asked questions
feature > annotation on a specific location on a given sequence
firewall > computer based system for isolation of company's intranet from internet
FTP > file transfer protocol; mechanism for transferring files across a network or between hosts
FASTA > A database search tool used to compare a nucleotide or peptide sequence to a sequence database. The program is based on the rapid sequence algorithm described by Lipman and Pearson. A heuristic sequence comparison algorithm for optimum local alignment
functional genomics > Systematic analysis of gene activity in healthy and diseased tissues.
G-
gap > adjacent null characters within one sequence of a multi-alignment
GCG Assembly > A tool using the GCG Fragment Assembly System created by Genetics Computer Group, Inc. It is used to assemble nucleotide sequence fragments contained in a cluster and view how they overlap with each other.
GenBank > The public DNA sequence database maintained by the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine.
gene > A specific DNA sequence which carries the information required for constructing proteins. The human genome is estimated to contain 100,000 to 150,000 genes.
gene family > genes related by divergent evolution from a common ancestor
genome > The total genetic information possessed by an individual organism. Each cell contains a complete copy of the genome.
Genomics > Sequencing and characterization of the genome and analysis of the relationship between gene activity and cell function.
Genotype > The unique genetic makeup of an individual organism.
GI > GenBank Identifier, a unique number assigned to protein and nucleotide sequences in the GenBank database.
global alignment > alignment of two complete protein or nucleic acid sequences
Gopher > document delivery system allowing retrieval & display of text-based files
GSS > "genome survey sequence; includes single-pass genomic data, etc."
GUI > software front ends that rely on pictures and icons to direct interaction of users
H-
heuristic algorithm > strategy for estimating a 'fuzzy logic' solution when an exact one is impractical
HGMP > Human Genome Mapping Project, based in Cambridge/United Kingdom
HMM > Hidden Markov model pattern recognition method
homologous (phyla) > features in different individuals genetically descended from common ancestor
homologous (mol biol) > similar regardless of genetic relationship
homology Domain > part of a prot sequence with similarity to an otherwise unrelated protein
homoplasy > similarity that has evolved independently; not indicative of common phylogenetic origin
host > any computer on the internet that can be addressed by a specific IP address
HTGS > "high throughput genome sequences; unfinished, unreliable, no annotations"
HTML > hypertext markup language; common representation of web page regardless of OS
HSPs > High-scoring Segment Pairs; two sequence fragments of arbitrary but equal length with an alignment that is locally maximal and for which the alignment score meets or exceeds a threshold (cutoff) score.
HSSP > Database of homology-derived structures of proteins (Sander et al.)
hyperlink > graphic or text within a web page that transports the user elsewhere
hypertext > text in a web page that functions as a hyperlink and is differentiated by color
I-
ICGEB > International Center for Genetic Engineering and Biotechnology, Trieste
IG > Intelligenetics Inc.; commercial vendor of the Intelligenetics Suite
indel> "acronym for ""Insertion or Deletion"" "
Internet> system of linked computers used for transmission of files & messages between hosts
introns > DNA sequences in genes which have no protein-coding function. Other non-coding regions include control sequences and intergenic regions whose functions are unknown.
IP address > "unique, numeric address of a computer host on the Net"
J-
Java > programming language used to create 'applets' to be run on any computer
K-
L-
LAN > local area network
library > A collection of expressed genes from a specific tissue sample, and their annotations.
links > feature used by Entrez to identify associated entries in other databases
local alignment > alignment of segments of two nucleic acid or protein sequences
M-
Markov model > statistical model for probability of each letter depends on predecessors
MEDLINE > literature database of papers in biomedical science
MeSH > medical subject headings; consistent terms used in Medline DB
motif > short conserved protein sequence( can be conserved part of domain)
molecular clock > hypothesis for estimation of evolutionary divergence; time constant of AA & Nuc substitutions
mRNA > Messenger RNA; an expressed gene that is then translated into a protein.
N-
NCBI > National Center for Biotechnology Information, Washington, D.C., USA
Needleman-Wunch algorithm > standard dynamic algorithm for optimal global alignments
neighbors > feature used by Entrez to identify related entries in database
Neural Net > statistical pattern recognition method
nr database > non-redundant database of protein or DNA sequences
normalized library > A cDNA library from which most of the highly expressed sequences have been removed in order to represent a greater proportion of low-abundance messenger MRNAs. Normalized libraries are not an accurate reflection of a tissue's gene-expression profile.
O-
optimal alignment> global or local alignment of two sequences with highest possible score
orthologs> homologous sequences in in different species with common ancestor (speciation)
P-
paralogs > homologous sequences that diverged by gene duplication
pattern > descriptor for short sequence motifs( AA-characters & meta-characters)
pdb database > Protein Data Bank; repository of solved protein structures
PSI-Blast > Position-specific iterative search that is fast & builds profile at every iteration
Phrap > Developed by Phil Green at the University of Washington, "PHil's Revised Assembly Program" is a tool for assembling shotgun-sequenced DNA fragments.
PHYLIP > Program Package created by J.Felsenstein for Phylogenicity
PIR > Protein Identification Resource International, a protein database vendor
plasmid > See cloning vector.
proteome > The complete profile of proteins expressed in a given tissue, cell or biological system at a given time.
proteomics > Systematic analysis of the protein expression of healthy and diseased tissues.
PSB > Abbreviation for 'Pacific Symposium on Biocomputing'
PubEST > Abbreviation for a sequence from a public-domain source, such as the WashU-Merck EST Project or Banting Institute.
Q-
R-
S-
SEG > program for filtering low complexity regions
sequence motif > short conserved aa- or n-sequence pattern with functional significance
signal > local functional site in genomic DNA (I.e. splice site)
Smith-Waterman algorithm> standard dynamic algorithm for optimal local alignments
STS > Sequence-tagged Site; short genomic landmark sequence
Swiss-PROT > curated protein sequence DB with annotations & low level of redundancy
SRS > Program for biological database browsing created by T.Etzold(EMBL)
SRSWWW > Version of the Sequence Retrieval System running in the WWW environment
T-
TREMBL > Translated EMBL, a SRS-based compilation of the EMBL DNA data library
U-
UNIX > a computer operating system
UniGene database > A public database, maintained by NCBI, which brings together sets of GenBank sequences that represent the transcription products of distinct genes.
V-
W-
Wormpep > dynamic curated DB of predicted protein sequences from C. elegans genome
X-
Y-
YAC > Yeast Artificial Chromosome (see cloning vector) used to clone DNA fragments up to 1 Mb.
Z-