Here is problem:
The focus of the Darwin2000 site is the Human beta globin region on chromosome 11 (LOCUS: HUMHBB) which is a ~73kb hunk of genomic DNA that contains within it, a 45 kb cluster which contains the coding regions for the five beta-like globin genes in the following order: 5'-epsilon -G-gamma -A-gamma -delta -beta-3'. There is an additional gene (actually a pseudo-gene), beta-1, located between the A-gamma and delta genes. The features of each of the genes clearly indicate which segments of genomic DNA are 'joined' to give rise to the coding regions (exons) for each protein.
It is proposed that learning the art of gene-finding is an important first step for an aspiring bioinformatacist. Thus the segment of gDNA indicated as containing the exons and other important coding sequence for HBgG is generally within the positional range of 34000-36000. Thus a search of the 3-frame translation of this segment of gDNA should reveal the coding segments for the different exons. The known peptide sequence will be used as a reference to aid in the 'initial' definition of the coding regions. The final and precise definition of the exons requires analyses of the putative exon/intron boundaries for the preservation of the codon structure for the terminal amino acid of each exon and the corresponding features of the 5'- and 3'-ends (GT-AG) of the internal introns.
-------------------------------
The promoter region sequences 'ccaat', 'ata' and 'cttctg' found at bases 34390, 34448 and 34485 in the hbgg gene,
precursor_RNA 34478..36069 /note="G-gamma globin"
Polyadenylation signal hbgg positions 36043-36048
CDS hbgg mRNA join(34531..34622,34745..34967,35854..35982)
exon 1 <34531..34622
intron-1 34623..34744
exon 2 34745..34967
intron-2 34968..35853
exon 3 35854..>35982
---------------------above info lifted from Darwin2000
start with 34,000-36000 2kb of gDNA
TACG 90 Char small http://24.1.175.29/tacg3/tacg300.Rest.form.html
For
pasted-in sequence, labelled 34000-36000hbgg
==
Sequence info:
2100 bases; 583 A(27.76 %) 430 C(20.48 %) 535 G(25.48 %) 552 T(26.29 %)
1 cctatgcctaaaacacatttcacaatccctgaacttttcaaaaattggtacatgctttaa 60
1 P M P K T H F T I P E L F K N W Y M L *
2 L C L K H I S Q S L N F S K I G T C F N
3 Y A * N T F H N P * T F Q K L V H A L T
61 ctttaaactacaggcctcactggagctacagacaagaaggtgaaaaacggctgacaaaag 120
1 L * T T G L T G A T D K K V K N G * Q K
2 F K L Q A S L E L Q T R R * K T A D K R
3 L N Y R P H W S Y R Q E G E K R L T K E
121 aagtcctggtatcttctatggtgggagaagaaaactagctaaagggaagaataaattaga 180
1 K S W Y L L W W E K K T S * R E E * I R
2 S P G I F Y G G R R K L A K G K N K L E
3 V L V S S M V G E E N * L K G R I N * R
181 gaaaaattggaatgactgaatcggaacaaggcaaaggctataaaaaaaattaagcagcag 240
1 E K L E * L N R N K A K A I K K I K Q Q
2 K N W N D * I G T R Q R L * K K L S S S
3 K I G M T E S E Q G K G Y K K N * A A V
241 tatcctcttgggggccccttccccacactatctcaatgcaaatatctgtctgaaacggtt 300
1 Y P L G G P F P T L S Q C K Y L S E T V
2 I L L G A P S P H Y L N A N I C L K R F
3 S S W G P L P H T I S M Q I S V * N G S
301 cctggctaaactccacccatgggttggccagccttgccttgaccaatagccttgacaagg 360
1 P G * T P P M G W P A L P * P I A L T R
2 L A K L H P W V G Q P C L D Q * P * Q G
3 W L N S T H G L A S L A L T N S L D K A
361 caaacttgaccaatagtcttagagtatccagtgaggccaggggccggcggctggctaggg 420
1 Q T * P I V L E Y P V R P G A G G W L G
2 K L D Q * S * S I Q * G Q G P A A G * G
3 N L T N S L R V S S E A R G R R L A R D
421 atgaagaataaaaggaagcacccttcagcagttccacacactcgcttctggaacgtctga 480
1 M K N K R K H P S A V P H T R F W N V *
2 * R I K G S T L Q Q F H T L A S G T S E
3 E E * K E A P F S S S T H S L L E R L R
481 ggttatcaataagctcctagtccagacgccatgggtcatttcacagaggaggacaaggct 540
1 G Y Q * A P S P D A M G H F T E E D K A
2 V I N K L L V Q T P W V I S Q R R T R L
3 L S I S S * S R R H G S F H R G G Q G Y
541 actatcacaagcctgtggggcaaggtgaatgtggaagatgctggaggagaaaccctggga 600
1 T I T S L W G K V N V E D A G G E T L G
2 L S Q A C G A R * M W K M L E E K P W E
3 Y H K P V G Q G E C G R C W R R N P G K
601 aggtaggctctggtgaccaggacaagggagggaaggaaggaccctgtgcctggcaaaagt 660
1 R * A L V T R T R E G R K D P V P G K S
2 G R L W * P G Q G R E G R T L C L A K V
3 V G S G D Q D K G G K E G P C A W Q K S
661 ccaggtcgcttctcaggatttgtggcaccttctgactgtcaaactgttcttgtcaatctc 720
1 P G R F S G F V A P S D C Q T V L V N L
2 Q V A S Q D L W H L L T V K L F L S I S
3 R S L L R I C G T F * L S N C S C Q S H
721 acaggctcctggttgtctacccatggacccagaggttctttgacagctttggcaacctgt 780
1 T G S W L S T H G P R G S L T A L A T C
2 Q A P G C L P M D P E V L * Q L W Q P V
3 R L L V V Y P W T Q R F F D S F G N L S
781 cctctgcctctgccatcatgggcaaccccaaagtcaaggcacatggcaagaaggtgctga 840
1 P L P L P S W A T P K S R H M A R R C *
2 L C L C H H G Q P Q S Q G T W Q E G A D
3 S A S A I M G N P K V K A H G K K V L T
841 cttccttgggagatgccataaagcacctggatgatctcaagggcacctttgcccagctga 900
1 L P W E M P * S T W M I S R A P L P S *
2 F L G R C H K A P G * S Q G H L C P A E
3 S L G D A I K H L D D L K G T F A Q L S
901 gtgaactgcactgtgacaagctgcatgtggatcctgagaacttcaaggtgagtccaggag 960
1 V N C T V T S C M W I L R T S R * V Q E
2 * T A L * Q A A C G S * E L Q G E S R R
3 E L H C D K L H V D P E N F K V S P G D
961 atgtttcagcactgttgcctttagtctcgaggcaacttagacaactgagtattgatctga 1020
1 M F Q H C C L * S R G N L D N * V L I *
2 C F S T V A F S L E A T * T T E Y * S E
3 V S A L L P L V S R Q L R Q L S I D L S
1021 gcacagcagggtgtgagctgtttgaagatactggggttgggagtgaagaaactgcagagg 1080
1 A Q Q G V S C L K I L G L G V K K L Q R
2 H S R V * A V * R Y W G W E * R N C R G
3 T A G C E L F E D T G V G S E E T A E D
1081 actaactgggctgagacccagtggcaatgttttagggcctaaggagtgcctctgaaaatc 1140
1 T N W A E T Q W Q C F R A * G V P L K I
2 L T G L R P S G N V L G P K E C L * K S
3 * L G * D P V A M F * G L R S A S E N L
1141 tagatggacaactttgactttgagaaaagagaggtggaaatgaggaaaatgacttttctt 1200
1 * M D N F D F E K R E V E M R K M T F L
2 R W T T L T L R K E R W K * G K * L F F
3 D G Q L * L * E K R G G N E E N D F S L
1201 tattagatttcggtagaaagaactttcacctttcccctatttttgttattcgttttaaaa 1260
1 Y * I S V E R T F T F P L F L L F V L K
2 I R F R * K E L S P F P Y F C Y S F * N
3 L D F G R K N F H L S P I F V I R F K T
1261 catctatctggaggcaggacaagtatggtcgttaaaaagatgcaggcagaaggcatatat 1320
1 H L S G G R T S M V V K K M Q A E G I Y
2 I Y L E A G Q V W S L K R C R Q K A Y I
3 S I W R Q D K Y G R * K D A G R R H I L
1321 tggctcagtcaaagtggggaactttggtggccaaacatacattgctaaggctattcctat 1380
1 W L S Q S G E L W W P N I H C * G Y S Y
2 G S V K V G N F G G Q T Y I A K A I P I
3 A Q S K W G T L V A K H T L L R L F L Y
1381 atcagctggacacatataaaatgctgctaatgcttcattacaaacttatatcctttaatt 1440
1 I S W T H I K C C * C F I T N L Y P L I
2 S A G H I * N A A N A S L Q T Y I L * F
3 Q L D T Y K M L L M L H Y K L I S F N S
1441 ccagatgggggcaaagtatgtccaggggtgaggaacaattgaaacatttgggctggagta 1500
1 P D G G K V C P G V R N N * N I W A G V
2 Q M G A K Y V Q G * G T I E T F G L E *
3 R W G Q S M S R G E E Q L K H L G W S R
1501 gattttgaaagtcagctctgtgtgtgtgtgtgtgtgtgtgcgcgcgtgtgtttgtgtgtg 1560
1 D F E S Q L C V C V C V C A R V C L C V
2 I L K V S S V C V C V C V R A C V C V C
3 F * K S A L C V C V C V C A R V F V C V
1561 tgtgagagcgtgtgtttcttttaacgttttcagcctacagcatacagggttcatggtggc 1620
1 C E S V C F F * R F Q P T A Y R V H G G
2 V R A C V S F N V F S L Q H T G F M V A
3 * E R V F L L T F S A Y S I Q