Course Glossary

From glossary of Trends in Bioinformatics
"From Glossary of Text Book " "Bioinformatics"
Some of the definitions below are based on the bioinformatics
and molecular genetics glossary available at the
U.S. Department of Energy Primer on Molecular Genetics.

 

 

Word/Phrase> Definition

A | B | C | D | E | F | G | H | I | J | K | L | M
N | O | P | Q | R | S | T | U | V | W | X | Y | Z


A

top

algorithm > any sequence of actions that performs a particular task; fixed procedure embodied in a computer program

annotation > A functional description of a clone, which may include identifying attributes such as locus name, keywords, and Medline references.

B

top

BAC > Bacterial Artificial Chromosome; see cloning vector.

BLAST > The Basic Local Alignment Search Tool is a fast technique for detecting ungapped subsequences that match a given query sequence.

BLIMPS > The BLocks IMProved Searcher is a search tool used to compare a DNA sequence against protein patterns in the Blocks database.

Blocks database > A public database of protein patterns that correspond to the most highly conserved regions in proteins.

base pair (bp) > Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double-helix by the bonds between base pairs. The human genome contains an estimated 3 billion base pairs (bp). One million bp is often referred to as 1 Mb and one thousand as 1 kb.

bootstrapping > statistical method to estimate reproducibility of phylogenetic trees

browser > program used to access Web Server pages (on Intra- or Inter-Net)

C

top

client > computer or software on a computer that interacts with remote computer

command line > non-GUI interaction with computer (usually UNIX)

content > length of genomic DNA with a particular function (I.e. exon)

cDNA > Complementary DNA; synthesized from a mRNA template.

CentiMorgans (cM) > The measurement between markers on a genetic map. Two markers are said to be 1 cM apart if they are separated by recombination 1% of the time, roughly equal to a distance of 1 million bp.

cloning vector > A DNA molecule originating from a virus, plasmid, cosmid, phage, bacteria, or yeast into which a foreign DNA fragment is integrated and then introduce into host cells, where it can be reproduced in large quantities (cloned).

cluster > A group of clones related to one another by sequence homology. Each cluster has a unique Cluster ID number for a given stringency.

codon > A sequence of three DNA bases within a gene that codes for a single amino acid.

Cosmid > Artificially constructed cloning vector containing the cos gene of phage lambda which is used to infect E. coli, permitting cloning of DNA fragments up to 45kb, larger than those possible using plasmid vectors.

D

top

domain > portion of a protein that folds independent of the rest of protein

DUST > Program for filtering low-complexity regions of DNA structure

domain name > classifies and identifies host machines in terms of Internet organization

download > transfer file from remote host to local machine

DNA > Deoxyribonucleic acid, the double-stranded molecule held together by weak bonds between base pairs of 4 different nucleotides. Encodes genetic information.

E-

top

EBI > European BioInformatics Institute (EMBL Outstation)

EMBL > European Molecular Biology Laboratory

EMBnet > European Molecular Biology network

EST > Expressed Sequence Tag; a sampling of sequence from a cDNA. snap-shot of expression in particular tissue at particular time

Entrez > An online resource provided by the National Center for Biotechnology Information (NCBI). It organizes GenBank sequences and links them to original literature.

exons > The protein-coding sequences of genes. Exons only comprise about 10% of the human genome. See introns.

email > rapidly composed & electronically transferred messages on computer

F-

top

filtering > replacing repeats with X (proteins) or N (DNA); avoid spurious scores

fold > overall folding pattern of a 3-D protein or RNA structure

ftp > 

FAQ > computer file of frequently asked questions

feature > annotation on a specific location on a given sequence

firewall > computer based system for isolation of company's intranet from internet

FTP >  file transfer protocol; mechanism for transferring files across a network or between hosts

FASTA > A database search tool used to compare a nucleotide or peptide sequence to a sequence database. The program is based on the rapid sequence algorithm described by Lipman and Pearson. A heuristic sequence comparison algorithm for optimum local alignment

functional genomics > Systematic analysis of gene activity in healthy and diseased tissues.

G-

top

gap > adjacent null characters within one sequence of a multi-alignment

GCG Assembly > A tool using the GCG Fragment Assembly System created by Genetics Computer Group, Inc. It is used to assemble nucleotide sequence fragments contained in a cluster and view how they overlap with each other.

GenBank > The public DNA sequence database maintained by the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine.

gene > A specific DNA sequence which carries the information required for constructing proteins. The human genome is estimated to contain 100,000 to 150,000 genes.

gene family > genes related by divergent evolution from a common ancestor

genome > The total genetic information possessed by an individual organism. Each cell contains a complete copy of the genome.

Genomics > Sequencing and characterization of the genome and analysis of the relationship between gene activity and cell function.

Genotype > The unique genetic makeup of an individual organism.

GI > GenBank Identifier, a unique number assigned to protein and nucleotide sequences in the GenBank database.

global alignment > alignment of two complete protein or nucleic acid sequences

Gopher > document delivery system allowing retrieval & display of text-based files

GSS > "genome survey sequence; includes single-pass genomic data, etc."

GUI > software front ends that rely on pictures and icons to direct interaction of users

H-

top

heuristic algorithm > strategy for estimating a 'fuzzy logic' solution when an exact one is impractical

HGMP > Human Genome Mapping Project, based in Cambridge/United Kingdom

HMM > Hidden Markov model pattern recognition method

homologous (phyla) > features in different individuals genetically descended from common ancestor

homologous (mol biol) > similar regardless of genetic relationship

homology Domain > part of a prot sequence with similarity to an otherwise unrelated protein

homoplasy > similarity that has evolved independently; not indicative of common phylogenetic origin

host > any computer on the internet that can be addressed by a specific IP address

HTGS > "high throughput genome sequences; unfinished, unreliable, no annotations"

HTML > hypertext markup language; common representation of web page regardless of OS

HSPs > High-scoring Segment Pairs; two sequence fragments of arbitrary but equal length with an alignment that is locally maximal and for which the alignment score meets or exceeds a threshold (cutoff) score.

HSSP > Database of homology-derived structures of proteins (Sander et al.)

hyperlink > graphic or text within a web page that transports the user elsewhere

hypertext >  text in a web page that functions as a hyperlink and is differentiated by color

I-

top

ICGEB > International Center for Genetic Engineering and Biotechnology, Trieste

IG > Intelligenetics Inc.; commercial vendor of the Intelligenetics Suite

indel> "acronym for ""Insertion or Deletion"" "

Internet> system of linked computers used for transmission of files & messages between hosts

introns > DNA sequences in genes which have no protein-coding function. Other non-coding regions include control sequences and intergenic regions whose functions are unknown.

IP address > "unique, numeric address of a computer host on the Net"

J-

top

Java > programming language used to create 'applets' to be run on any computer

K-

top

 

L-

top

LAN > local area network

library > A collection of expressed genes from a specific tissue sample, and their annotations.

links > feature used by Entrez to identify associated entries in other databases

local alignment > alignment of segments of two nucleic acid or protein sequences

M-

top

Markov model > statistical model for probability of each letter depends on predecessors

MEDLINE > literature database of papers in biomedical science

MeSH > medical subject headings; consistent terms used in Medline DB

motif > short conserved protein sequence( can be conserved part of domain)

molecular clock > hypothesis for estimation of evolutionary divergence; time constant of AA & Nuc substitutions

mRNA > Messenger RNA; an expressed gene that is then translated into a protein.

N-

top

NCBI > National Center for Biotechnology Information, Washington, D.C., USA

Needleman-Wunch algorithm > standard dynamic algorithm for optimal global alignments

neighbors > feature used by Entrez to identify related entries in database

Neural Net > statistical pattern recognition method

nr database > non-redundant database of protein or DNA sequences

normalized library > A cDNA library from which most of the highly expressed sequences have been removed in order to represent a greater proportion of low-abundance messenger MRNAs. Normalized libraries are not an accurate reflection of a tissue's gene-expression profile.

O-

top

optimal alignment> global or local alignment of two sequences with highest possible score

orthologs> homologous sequences in in different species with common ancestor (speciation)

P-

top

paralogs > homologous sequences that diverged by gene duplication

pattern > descriptor for short sequence motifs( AA-characters & meta-characters)

pdb database > Protein Data Bank; repository of solved protein structures

PSI-Blast > Position-specific iterative search that is fast & builds profile at every iteration

Phrap > Developed by Phil Green at the University of Washington, "PHil's Revised Assembly Program" is a tool for assembling shotgun-sequenced DNA fragments.

PHYLIP > Program Package created by J.Felsenstein for Phylogenicity

PIR > Protein Identification Resource International, a protein database vendor

plasmid > See cloning vector.

proteome > The complete profile of proteins expressed in a given tissue, cell or biological system at a given time.

proteomics > Systematic analysis of the protein expression of healthy and diseased tissues.

PSB > Abbreviation for 'Pacific Symposium on Biocomputing'

PubEST > Abbreviation for a sequence from a public-domain source, such as the WashU-Merck EST Project or Banting Institute.

Q-

top

 

R-

top

 

S-

top

SEG > program for filtering low complexity regions

sequence motif > short conserved aa- or n-sequence pattern with functional significance

signal > local functional site in genomic DNA (I.e. splice site)

Smith-Waterman algorithm> standard dynamic algorithm for optimal local alignments

STS > Sequence-tagged Site; short genomic landmark sequence

Swiss-PROT > curated protein sequence DB with annotations & low level of redundancy

SRS > Program for biological database browsing created by T.Etzold(EMBL)

SRSWWW > Version of the Sequence Retrieval System running in the WWW environment

T-

top

 

TREMBL > Translated EMBL, a SRS-based compilation of the EMBL DNA data library

U-

top

 

UNIX > a computer operating system

UniGene database > A public database, maintained by NCBI, which brings together sets of GenBank sequences that represent the transcription products of distinct genes.

V-

top

 

W-

top

Wormpep > dynamic curated DB of predicted protein sequences from C. elegans genome

X-

top

 

Y-

top

YAC > Yeast Artificial Chromosome (see cloning vector) used to clone DNA fragments up to 1 Mb.

Z-

top