BioBind Home Page
BioBind Home Page
BioBind Contact
BioBind Contact
BioBind FAQ
BioBind FAQ
BioBind logo
 
 
>Home >FAQ >EMBOSS Guide

Sequence Formats

EMBOSS currently reads and write 30 separate sequence formats, allowing the software analysis suite great flexibility. The majority of sequences are displayed in lines of 60 characters unless explicitly stated in the description.

abi abi trace file format is the type of file produced by ABI sequencing machines. It contains the trace data - i.e. the fluorescence levels of the 4 bases along the sequencing run, together with the sequence, as deduced from that data. This trace file may be viewed using abiview.

acedb ACeDB format is the type of file used for the ACe database - first written to store the information from the sequencing of the nematode worm C.elegans. Each sequence line displays within a 50 character limit.


DNA : "BT006818"
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag

Peptide : "CALM_HUMAN"
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

aln See clustal format.
asn1 A subset of ASN.1 format containing entry name, accession number, description and sequence. It is similar to the current ASN.1 output of readseq. There is no consistent number of characters on a sequence line.
seq {
id { local id 1 }, descr { title "Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds." }, inst { repr raw, mol dna, length 450, topology linear, seq-data iupacna "atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctatttg ataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactgggtcagaacccaacagaag ctgaattgcaggatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccccgaatttttgactatga tggctagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatg gttatatcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagatgaaa tgatcagagaagcagatattgatggagacggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag" } } ,

seq { id { local id 1 }, descr { title "Calmodulin (CaM)." }, inst { repr raw, mol aa, length 148, topology linear, { seq-data iupacaa "ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNG TIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYE EFVQMMTAK" } } ,

clustal Also known as aln format. The beginning of each entry is denoted by the name of the program for which this format is default - Clustal W. Each aligned sequence is broken up into five blocks of ten characters each and displayed underneath each other. The sequence name is represented at the beginning of each section of sequence. Gaps are denoted by a dash (-).
CLUSTAL W(1.4) multiple sequence alignment


CaM2            ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
CaM3            ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
CaM1            ATGGCTGATC AGCTGA---- ---------- -------TCA AGGAAGCCTT
                                                                      

CaM2            CTCCCTATTT GATAAAGATG GCGATGGCAC CAAA------ ----------
CaM3            CTCCCTATTT GATAAAGATG GCGATGGCAC CAT------- ----------
CaM1            CTCCCTATTT GATAAAGATG GCGATGGCAC CATCACAACA AAGGAACTTG
                                                                      

CaM2            ---CTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG
CaM3            ---------- ---------- GGTCAGAACC CAACAGAAGC TGAATTGCAG
CaM1            GAACTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG
                                                                      

CaM2            GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
CaM3            GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
CaM1            GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
                                                                      

CaM2            CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
CaM3            CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
CaM1            CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
                                                                      

CaM2            AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
CaM3            AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
CaM1            AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
                                                                      

CaM2            ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
CaM3            ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
CaM1            ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
                                                                      

CaM2            AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
CaM3            AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
CaM1            AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
                                                                      

CaM2            ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
CaM3            ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
CaM1            ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
                                                                      
                                                                      

CLUSTAL W(1.4) multiple sequence alignment CaM2 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD CaM3 ---------- ---------- ---------- ---------- ---------- CaM1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD CaM2 MINEVDADGN ---------- ----GTIDFP EFLTMMEAFR VFDKDGNGYI CaM3 --------AD ---------- ----QLTEEQ IAEEIREAFR VFDKDGNGYI CaM1 MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI CaM2 SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK CaM3 SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK CaM1 SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK

codata Codata format includes the indentifier (ENTRY) and accession number of the sequence together with the description line (TITLE). The sequence itself starts after the line beginning SEQUENCE and the end of the sequence is denoted by three soliduses (///) on the line directly below the sequence. Characters in a sequence are numbered in blocks of five up to a maximum line display of 30 across the top and consecutively down the left hand side.
ENTRY           BT006818 
TITLE           Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds., 450 bases
ACCESSION       BT006818
SEQUENCE        
                 5        10        15        20        25        30
      1 a t g g c t g a t c a g c t g a c c g a a g a a c a g a t t
     31 g c t g a a t t c a a g g a a g c c t t c t c c c t a t t t
     61 g a t a a a g a t g g c g a t g g c a c c a t c a c a a c a
     91 a a g g a a c t t g g a a c t g t c a t g a g g t c a c t g
    121 g g t c a g a a c c c a a c a g a a g c t g a a t t g c a g
    151 g a t a t g a t c a a t g a a g t g g a t g c t g a t g g t
    181 a a t g g c a c c a t t g a c t t c c c c g a a t t t t t g
    211 a c t a t g a t g g c t a g a a a a a t g a a a g a t a c a
    241 g a t a g t g a a g a a g a a a t c c g t g a g g c a t t c
    271 c g a g t c t t t g a c a a g g a t g g c a a t g g t t a t
    301 a t c a g t g c a g c a g a a c t a c g t c a c g t c a t g
    331 a c a a a c t t a g g a g a a a a a c t a a c a g a t g a a
    361 g a a g t a g a t g a a a t g a t c a g a g a a g c a g a t
    391 a t t g a t g g a g a c g g a c a a g t c a a c t a t g a a
    421 g a a t t c g t a c a g a t g a t g a c t g c a a a a t a g
///

ENTRY CALM_HUMAN TITLE Calmodulin (CaM)., 148 bases ACCESSION P62158 SEQUENCE 5 10 15 20 25 30 1 A D Q L T E E Q I A E F K E A F S L F D K D G D G T I T T K 31 E L G T V M R S L G Q N P T E A E L Q D M I N E V D A D G N 61 G T I D F P E F L T M M A R K M K D T D S E E E I R E A F R 91 V F D K D G N G Y I S A A E L R H V M T N L G E K L T D E E 121 V D E M I R E A D I D G D G Q V N Y E E F V Q M M T A K ///

dbid Similar to FASTA format with a greater than (>) sign at the start of the description line. This sign is followed by database   identifier   accession   description. The accession number is optional. This format is input only.
>EMBL BT006818 BT006818 Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctattt
gataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactg
ggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggt
aatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagataca
gatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaa
gaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaa

>SWISSPROT CALM_HUMAN P62158 Calmodulin (CaM). ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK

ddbj This is the same as the GenBank format designed for GenBank entries. It contains all features concerned with the sequence as they would appear in the GenBank or GenPept databases. The end of the entry is denoted by double soliduses (//) on the line immediately after the end of the sequence.
LOCUS       BT006818
DEFINITION  Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
ACCESSION   BT006818
VERSION     BT006818.1
KEYWORDS    FLI_CDNA.
SOURCE      Homo sapiens (human).
  ORGANISM  Homo sapiens (human)
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
BASE COUNT      163 a     76 c    112 g     99 t
ORIGIN
       1  atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt ctccctattt
      61  gataaagatg gcgatggcac catcacaaca aaggaacttg gaactgtcat gaggtcactg
     121  ggtcagaacc caacagaagc tgaattgcag gatatgatca atgaagtgga tgctgatggt
     181  aatggcacca ttgacttccc cgaatttttg actatgatgg ctagaaaaat gaaagataca
     241  gatagtgaag aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat
     301  atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact aacagatgaa
     361  gaagtagatg aaatgatcag agaagcagat attgatggag acggacaagt caactatgaa
     421  gaattcgtac agatgatgac tgcaaaatag
//

LOCUS CALM_HUMAN DEFINITION Calmodulin (CaM). ACCESSION P62158 P02593 P70667 P99014 Q61379 Q61380 KEYWORDS 3D-structure; Acetylation; Calcium-binding; KEYWORDS Direct protein sequencing; Methylation; Phosphorylation; Repeat; KEYWORDS Ubl conjugation. SOURCE Homo sapiens (Human). ORGANISM Homo sapiens (Human) Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. BASE COUNT 11 a 0 c 11 g 12 t 114 others ORIGIN 1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN 61 GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE 121 VDEMIREADI DGDGQVNYEE FVQMMTAK //

embl This is the format of the flatfile EMBL entries. It contains all features concerned with the sequence as they would appear in the EMBL database. Each line is identified with a tag, representing the data it contains. The sequence is represented by the SQ tag and end of the entry is denoted by double soliduses (//) on the line immediately after the end of the sequence.
ID   BT006818   standard; DNA; UNC; 450 BP.
AC   BT006818;
SV   BT006818.1
DE   Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
KW   FLI_CDNA.
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
FH   Key             Location/Qualifiers
FH
FT   source          1..450
FT                   /db_xref="taxon:9606"
FT                   /db_xref="RZPD:FLEXo833C0415D0"
FT                   /mol_type="mRNA"
FT                   /note="Vector: pDNR-Dual"
FT                   /organism="Homo sapiens"
FT                   /clone="GH00517X1.0"
FT                   /clone_lib="BD Creator(TM) CDS Library derived from MGC
FT                   collection"
FT                   /lab_host="DH5alpha T1 resistant"
FT   CDS             1..450
FT                   /codon_start=1
FT                   /product="calmodulin 1 (phosphorylase kinase, delta)"
FT                   /protein_id="AAP35464.1"
FT                   /translation="MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPT
FT                   EAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAA
FT                   ELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK"
SQ   Sequence 450 BP; 163 A; 76 C; 112 G; 99 T; 0 other;
     atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt ctccctattt        60
     gataaagatg gcgatggcac catcacaaca aaggaacttg gaactgtcat gaggtcactg       120
     ggtcagaacc caacagaagc tgaattgcag gatatgatca atgaagtgga tgctgatggt       180
     aatggcacca ttgacttccc cgaatttttg actatgatgg ctagaaaaat gaaagataca       240
     gatagtgaag aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat       300
     atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact aacagatgaa       360
     gaagtagatg aaatgatcag agaagcagat attgatggag acggacaagt caactatgaa       420
     gaattcgtac agatgatgac tgcaaaatag                                        450
//
fasta The fasta format is the default output for your BioBind software suite. It consists of a greater than (>) sign which denotes the beginning of the entry. This is followed by a description of the sequence on the same line. The line immediately after the description is the sequence. The end of the entry in a multiple sequence file is denoted only by the beginning of a second entry at the > sign.
>BT006818 BT006818.1 Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctattt
gataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactg
ggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggt
aatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagataca
gatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaa
gaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaa
gaattcgtacagatgatgactgcaaaatag

>CALM_HUMAN P62158 Calmodulin (CaM). ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK

fitch Designed originally for DNA sequences, the Fitch format contains the sequence identifier and the base count on the line immediately preceding the sequence. All bases in the sequence are divided up into blocks of three. There are twenty of these blocks in one line. This format is output format only.
BT006818, 450 bases
 atg gct gat cag ctg acc gaa gaa cag att gct gaa ttc aag gaa gcc ttc tcc cta ttt
 gat aaa gat ggc gat ggc acc atc aca aca aag gaa ctt gga act gtc atg agg tca ctg
 ggt cag aac cca aca gaa gct gaa ttg cag gat atg atc aat gaa gtg gat gct gat ggt
 aat ggc acc att gac ttc ccc gaa ttt ttg act atg atg gct aga aaa atg aaa gat aca
 gat agt gaa gaa gaa atc cgt gag gca ttc cga gtc ttt gac aag gat ggc aat ggt tat
 atc agt gca gca gaa cta cgt cac gtc atg aca aac tta gga gaa aaa cta aca gat gaa
 gaa gta gat gaa atg atc aga gaa gca gat att gat gga gac gga caa gtc aac tat gaa
 gaa ttc gta cag atg atg act gca aaa tag
genbank This format is the same as the flatfiles used in GenBank entries. It contains all features concerned with the sequence as they would appear in the GenBank or GenPept databases. The end of the entry is denoted by double soliduses (//) on the line immediately after the end of the sequence.
LOCUS       BT006818
DEFINITION  Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
ACCESSION   BT006818
VERSION     BT006818.1
KEYWORDS    FLI_CDNA.
SOURCE      Homo sapiens (human).
  ORGANISM  Homo sapiens (human)
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
BASE COUNT      163 a     76 c    112 g     99 t
ORIGIN
       1  atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt ctccctattt
      61  gataaagatg gcgatggcac catcacaaca aaggaacttg gaactgtcat gaggtcactg
     121  ggtcagaacc caacagaagc tgaattgcag gatatgatca atgaagtgga tgctgatggt
     181  aatggcacca ttgacttccc cgaatttttg actatgatgg ctagaaaaat gaaagataca
     241  gatagtgaag aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat
     301  atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact aacagatgaa
     361  gaagtagatg aaatgatcag agaagcagat attgatggag acggacaagt caactatgaa
     421  gaattcgtac agatgatgac tgcaaaatag
//

LOCUS CALM_HUMAN DEFINITION Calmodulin (CaM). ACCESSION P62158 P02593 P70667 P99014 Q61379 Q61380 KEYWORDS 3D-structure; Acetylation; Calcium-binding; KEYWORDS Direct protein sequencing; Methylation; Phosphorylation; Repeat; KEYWORDS Ubl conjugation. SOURCE Homo sapiens (Human). ORGANISM Homo sapiens (Human) Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. BASE COUNT 11 a 0 c 11 g 12 t 114 others ORIGIN 1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN 61 GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE 121 VDEMIREADI DGDGQVNYEE FVQMMTAK //

gff This is the General Feature Format created at the Sanger Institute for Genome data. Each comment line is started with a double hash (##) and it contains the version numbers of the gff format and the program used to create the file, the date of creation and the sequence identifier. Each piece of information is on a separate line. Information follows the format
 
       
   [attributes] [comments] 
The end of the entry is denoted by the word end and the type of sequence.
##gff-version 2
##source-version EMBOSS 2.9.0bb280604
##date 2004-08-28
##DNA BT006818
##atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctattt
##gataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactg
##ggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggt
##aatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagataca
##gatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
##atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaa
##gaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaa
##gaattcgtacagatgatgactgcaaaatag
##end-DNA

##gff-version 2 ##source-version EMBOSS 2.9.0bb280604 ##date 2004-08-28 ##Protein CALM_HUMAN ##ADQLT EEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN ##GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE ##VDEMIREADIDGDGQVNYEEFVQMMTAK ##end-Protein

gcg First used as a format for the Genetics Computer Group software written in Wisconsin, USA, it became the format for the Wisconsin Package as a commercial sequence analysis package. The start of the entry is denoted by a double bang (!!) followed by a definition of the sequence type - nucleotide (NA) or protein (AA). The description is on the line immediately below and one line below that, the sequence information as identifier    sequence length   sequence type   checksum. The checksum is a value based on the sequence, and if the sequence is manually altered, the checksum will no longer tally with the sequence and it will be rejected as an input file. This format is used for GCG releases 9.x and 10.x files. The sequence is displayed on each line as five blocks of ten.
!!NA_SEQUENCE 1.0

Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.

BT006818  Length: 450  Type: N  Check: 7527 ..

   1 atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt

  51 ctccctattt gataaagatg gcgatggcac catcacaaca aaggaacttg

 101 gaactgtcat gaggtcactg ggtcagaacc caacagaagc tgaattgcag

 151 gatatgatca atgaagtgga tgctgatggt aatggcacca ttgacttccc

 201 cgaatttttg actatgatgg ctagaaaaat gaaagataca gatagtgaag

 251 aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat

 301 atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact

 351 aacagatgaa gaagtagatg aaatgatcag agaagcagat attgatggag

 401 acggacaagt caactatgaa gaattcgtac agatgatgac tgcaaaatag

!!AA_SEQUENCE 1.0 Calmodulin (CaM). CALM_HUMAN Length: 148 Type: P Check: 2160 .. 1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD 51 MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI 101 SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK

gcg8 This format has the same origins and requisites of gcg format, but is the format used by GCG release 8.x.
Hennig86 Designed for DNA data, the Hennig86 format translates each base into a number where A = 0; T = 1; G = 2; C =3. The start is denoted by the comment xread and is followed on a separate line by the application and creation date of the file enclosed in single quotes separated from the text by whitespace. A separate line records the sequence length and number of sequence files represented (in this case 1). Immediately below this line is the identifier line and below this the sequence in numerical form. The end of the entry is denoted by a semi-colon (;). The entire sequence is displayed on a single line.
xread
' Written by EMBOSS 28/06/04 '
450 1
BT006818
012231201302312033200200302011231200113002200233113133310111201000201223201223033013030030002200311220031213012022130312221302003330030200231200112302201012013001200212201231201221001223033011203113333200111112031012012231020000012000201030201021200200200013321202230113320213111203002201223001221101013021230230200310321303213012030003110220200000310030201200200210201200012013020200230201011201220203220300213003101200200113210302012012031230000102
;
ig Intelligenetics format was created for the IntelliGenetics software suite and designed for DNA. The start of the entry is denoted by a semi-colon (;) followed by the sequence description and base count. The sequence identifier is on the next line down followed on the next by the sequence. The end of the entry is denoted by the number 1 on the final sequence line. There are 50 bases displayed on each line of sequence.
;Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds., 450 bases
BT006818
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg
gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag
gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc
cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag
aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact
aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag
acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag1

jackknifer Jackknifer format starts with a single quote separated from the application and creation date by a single whitespace. The sequence is displayed on the subsequent lines with the identifier enclosed by parentheses and separated from the block of 50 sequence characters. The end of the entry is denoted by a semi-colon (;) on a separate line.
' Written by EMBOSS 28/06/04 
(BT006818)           atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
(BT006818)           ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg
(BT006818)           gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag
(BT006818)           gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc
(BT006818)           cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag
(BT006818)           aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
(BT006818)           atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact
(BT006818)           aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag
(BT006818)           acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag
;

' Written by EMBOSS 28/06/04 (CALM_HUMAN) ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD (CALM_HUMAN) MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI (CALM_HUMAN) SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK ;

jackknifernon Jackkniferon format starts with a single quote separated from the application and creation date by a single whitespace. The sequence identifier is displayed on the subsequent line encased in parentheses and separated from the first block of 50 sequence characters by whitespace. Each subsequent sequence line is displayed underneath. The end of the entry is denoted by a semi-colon (;) on a separate line.
' Written by EMBOSS 28/06/04 
(BT006818)           atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg
gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag
gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc
cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag
aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact
aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag
acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag
;

' Written by EMBOSS 28/06/04 (CALM_HUMAN) ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK ;

mega This contains the comment line #mega and the application and creation date on the subsequent line. This is separated from the sequence by a blank line. The sequence lines starts with the hash (#) character and the sequence identifier with whitespace before the sequence is displayed as blocks of 50 characters.
#mega
TITLE: Written by EMBOSS 28/06/04

#BT006818             atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt

#BT006818             ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg

#BT006818             gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag

#BT006818             gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc

#BT006818             cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag

#BT006818             aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat

#BT006818             atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact

#BT006818             aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag

#BT006818             acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag

#mega TITLE: Written by EMBOSS 28/06/04 #CALM_HUMAN ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD #CALM_HUMAN MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI #CALM_HUMAN SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

meganon This is similar to mega format but there is no break in the sequence line.
#mega
TITLE: Written by EMBOSS 28/06/04

#BT006818            
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttatatcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag

#mega TITLE: Written by EMBOSS 28/06/04 #CALM_HUMAN ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

nexus Nexus interleaved format, also known as PAUP format is the default output for the PAUP suite of phylogeny programs and is designed to represent DNA sequence only. The entry is headed by the #NEXUS comment line followed on the next line by the application and creation date. There is a blank line between this information and the start of the sequence data including the number of sequences (in this case 1), length, datatype and representation on missing bases (n) and gaps (-). Each of these information lines is ended with a semi-colon (;). A further blank line separates this information from the matrix comment. The sequence is represented on the succeeding lines starting with the sequence identifier and a block of 50 bases. The end of the sequence is denoted by a semi-colon (;). There follows a blank line and further comment lines ending with a semi-colon (;).
#NEXUS
[TITLE: Written by EMBOSS 28/06/04]

begin data;
dimensions ntax=1 nchar=450;
format interleave datatype=DNA missing=N gap=-;

matrix
BT006818             atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt

BT006818             ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg

BT006818             gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag

BT006818             gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc

BT006818             cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag

BT006818             aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat

BT006818             atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact

BT006818             aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag

BT006818             acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag
;

end;
begin assumptions;
options deftype=unord;
end;
nexusnon This format is similar to nexus format but the sequence is displayed on a single line with no breaks.
#NEXUS
[TITLE: Written by EMBOSS 28/06/04]

begin data;
dimensions ntax=1 nchar=450;
format datatype=DNA missing=N gap=-;

matrix
BT006818
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttatatcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag
;

end;
begin assumptions;
options deftype=unord;
end;
ncbi This is a similar format to the fasta format to contain NCBI specified information. The description line contains the comments gn1 and unk separated by a bar (|) from each other and the sequence description. The sequence begins on the next line.
>gnl|unk|BT006818 (BT006818.1) Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds.
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctattt
gataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactg
ggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggt
aatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagataca
gatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaa
gaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaa
gaattcgtacagatgatgactgcaaaatag

>gnl|unk|CALM_HUMAN (P62158) Calmodulin (CaM). ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK

nbrf See pir format.
pearson See fasta format. Use this format if your sequence headers include pipes (|) that you wish to retain.
pfam See stockholm format.
phylip This interleaved format is the default format for the PHYLIP suite of phylogeny programs offered with your BioBind software. The entry starts with a sequence (species) count (in this case 3) and the length of the longest sequence. The line immediately below contains the sequence identifier and the first sequence line, with characters displayed in five blocks of ten. The subsequent sequence lines appear justified directly under the first with no identifier. The sequence identifer MUST be a maximum of 10 characters in length but blank spaces and some puntuation marks are acceptable. Those not allowed are parentheses (( )), square brackets ([ ]), colon (:), semi-colon (;) and comma (,). Gap characters are denoted by a dash (-).
 3 450
CaM2      ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
CaM3      ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
CaM1      ATGGCTGATC AGCTGA---- ---------- -------TCA AGGAAGCCTT

          CTCCCTATTT GATAAAGATG GCGATGGCAC CAAA------ ----------
          CTCCCTATTT GATAAAGATG GCGATGGCAC CAT------- ----------
          CTCCCTATTT GATAAAGATG GCGATGGCAC CATCACAACA AAGGAACTTG

          ---CTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG
          ---------- ---------- GGTCAGAACC CAACAGAAGC TGAATTGCAG
          GAACTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG

          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC

          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG

          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT

          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT

          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG

          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG

3 148 CaM2 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD CaM3 ---------- ---------- ---------- ---------- ---------- CaM1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN ---------- ----GTIDFP EFLTMMEAFR VFDKDGNGYI --------AD ---------- ----QLTEEQ IAEEIREAFR VFDKDGNGYI MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK

phylip3 Similar to phylip format, this is the non-interleaved default output of PHYLIP Version 3.2. It includes a YF notation on the initial line.
1 450 YF
CaM2      ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
          CTCCCTATTT GATAAAGATG GCGATGGCAC CAAA------ ----------
          ---CTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG
          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
CaM3      ATGGCTGATC AGCTGACCGA AGAACAGATT GCTGAATTCA AGGAAGCCTT
          CTCCCTATTT GATAAAGATG GCGATGGCAC CAT------- ----------
          ---------- ---------- GGTCAGAACC CAACAGAAGC TGAATTGCAG
          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG
CaM1      ATGGCTGATC AGCTGA---- ---------- -------TCA AGGAAGCCTT
          CTCCCTATTT GATAAAGATG GCGATGGCAC CATCACAACA AAGGAACTTG
          GAACTGTCAT GAGGTCACTG GGTCAGAACC CAACAGAAGC TGAATTGCAG
          GATATGATCA ATGAAGTGGA TGCTGATGGT AATGGCACCA TTGACTTCCC
          CGAATTTTTG ACTATGATGG CTAGAAAAAT GAAAGATACA GATAGTGAAG
          AAGAAATCCG TGAGGCATTC CGAGTCTTTG ACAAGGATGG CAATGGTTAT
          ATCAGTGCAG CAGAACTACG TCACGTCATG ACAAACTTAG GAGAAAAACT
          AACAGATGAA GAAGTAGATG AAATGATCAG AGAAGCAGAT ATTGATGGAG
          ACGGACAAGT CAACTATGAA GAATTCGTAC AGATGATGAC TGCAAAATAG

1 148 YF CaM2 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN ---------- ----GTIDFP EFLTMMEAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK CaM3 ---------- ---------- ---------- ---------- ---------- --------AD ---------- ----QLTEEQ IAEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK CaM1 ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK

pir Also known as nbrf format this entry starts with a greater than (>) sign followed by the type of sequence (D for nucleotide, P for protein, F for fragment) and the number 1 and separated from the sequence identifier by a semi-colon (;). The sequence description is displayed on the next line with the base (character) count separated from the description by a comma (,). The sequence appears on subsequent lines as five blocks of ten characters and the endpoint is denoted by an asterisk (*) immediately after the final sequence character. It is also acceptable to have no whitespace on the sequence lines.
>D1;BT006818
Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, complete cds., 450 bases
 atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt
 ctccctattt gataaagatg gcgatggcac catcacaaca aaggaacttg
 gaactgtcat gaggtcactg ggtcagaacc caacagaagc tgaattgcag
 gatatgatca atgaagtgga tgctgatggt aatggcacca ttgacttccc
 cgaatttttg actatgatgg ctagaaaaat gaaagataca gatagtgaag
 aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat
 atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact
 aacagatgaa gaagtagatg aaatgatcag agaagcagat attgatggag
 acggacaagt caactatgaa gaattcgtac agatgatgac tgcaaaatag*

>P1;CALM_HUMAN Calmodulin (CaM)., 148 bases ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE VDEMIREADI DGDGQVNYEE FVQMMTAK*

plain This format merely holds the sequence as standard text with no formatting at all. As there is nothing to indicate the start or end of a sequence, this type of format is only suitable for single sequence files.
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg
gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag
gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc
cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag
aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact
aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag
acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag

ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

raw See plain format.
selex Selex is an interleaved multiple alignment format. Machine comment lines at the head of the file are denoted with a hash equals sign (#=). User comment lines can start with either a hash (#) or a percent (%) sign. The #=SQ denotes a comment for each sequence. This must be in the same order the sequences appear. The comment seqeunce is followed by the sequence name,   weight,   database source name,   database accession,   source coordinates (start - stop:original length),  description. Any information not available may be replaced with a dash (-) with the exception of the source coordinates which should contain the number zero (0). Each sequence line contains a name which can be any length, but must include no whitespace. followed by a block of 50 sequence characters. Gaps may be denoted by a dash (-), a dot (.), a blank space or an underscore (_). Sequence characters are case sensitive.
#=SQ CaM2 0.10 - - 0..0:0 -
#=SQ CaM3 33.30 - - 0..0:0 -
#=SQ CaM1 66.60 - - 0..0:0 -

CaM2 ATGGCTGATCAGCTGACCGAAGAACAGATTGCTGAATTCAAGGAAGCCTT
CaM3 ATGGCTGATCAGCTGACCGAAGAACAGATTGCTGAATTCAAGGAAGCCTT
CaM1 ATGGCTGATCAGCTGA---------------------TCAAGGAAGCCTT

CaM2 CTCCCTATTTGATAAAGATGGCGATGGCACCAAA----------------
CaM3 CTCCCTATTTGATAAAGATGGCGATGGCACCAT-----------------
CaM1 CTCCCTATTTGATAAAGATGGCGATGGCACCATCACAACAAAGGAACTTG

CaM2 ---CTGTCATGAGGTCACTGGGTCAGAACCCAACAGAAGCTGAATTGCAG
CaM3 --------------------GGTCAGAACCCAACAGAAGCTGAATTGCAG
CaM1 GAACTGTCATGAGGTCACTGGGTCAGAACCCAACAGAAGCTGAATTGCAG

CaM2 GATATGATCAATGAAGTGGATGCTGATGGTAATGGCACCATTGACTTCCC
CaM3 GATATGATCAATGAAGTGGATGCTGATGGTAATGGCACCATTGACTTCCC
CaM1 GATATGATCAATGAAGTGGATGCTGATGGTAATGGCACCATTGACTTCCC

CaM2 CGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACAGATAGTGAAG
CaM3 CGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACAGATAGTGAAG
CaM1 CGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACAGATAGTGAAG

CaM2 AAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
CaM3 AAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
CaM1 AAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT

CaM2 ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACT
CaM3 ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACT
CaM1 ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACT

CaM2 AACAGATGAAGAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAG
CaM3 AACAGATGAAGAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAG
CaM1 AACAGATGAAGAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAG

CaM2 ACGGACAAGTCAACTATGAAGAATTCGTACAGATGATGACTGCAAAATAG
CaM3 ACGGACAAGTCAACTATGAAGAATTCGTACAGATGATGACTGCAAAATAG
CaM1 ACGGACAAGTCAACTATGAAGAATTCGTACAGATGATGACTGCAAAATAG

#=SQ CaM2 100.00 - - 0..0:0 - #=SQ CaM3 100.00 - - 0..0:0 - #=SQ CaM1 0.10 - - 0..0:0 - CaM2 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM3 -------------------------------------------------- CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM2 MINEVDADGN--------------GTIDFPEFLTMMEAFRVFDKDGNGYI CaM3 --------AD--------------QLTEEQIAEEIREAFRVFDKDGNGYI CaM1 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI CaM2 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK CaM3 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK CaM1 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

staden This is very similar to plain format with the exception that the sequence is preceded by the sequence name followed by four dashes (-) and encased in angular brackets (< >). There is is no limit to the length of the name, but it must not contain whitespace. Punctuation is also allowed. This sequence name will not be seen on webpages as it is enclosed in angular brackets. Comments may i#be inserted into lines between the naem and the sequencce by preceding them with a semi-colon (;) It was created for the STADEN sequencing software package, although this package generally uses the EMBL format now.
<BT006818---->
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctattt
gataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactg
ggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggt
aatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagataca
gatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaa
gaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaa
gaattcgtacagatgatgactgcaaaatag

<CALM_HUMAN----> ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK

stockholm This format is used principally in HMMER, Pfam and Belvu as a format for protein sequence. It starts with a hash (#) character followed by the name of the format and a version number (currently 1.0). The aligned sequences can be displayed as a wrap around alignment or with no breaks. The length is optional, but the name is mandatory. The end of the entry is denoted by double soliduses (//) on the line directly after the sequence lines. Gaps are denoted as a dash (-) or a dot (.).

This alignment may be annotated using the tags:

  • #=GF  Free text file annotation. Should appear above the alignment.
  • #=GC  Column annotation - only 1 character per column. Should appear below the alignment.
  • #=GS  Free text sequence annotation. Should appear above the alignment or just below corresponding sequence.
  • #=GR  Sequence and column annotation - only one character per column. Should appear just below the corresponding sequence.
# STOCKHOLM 1.0

#=GS CaM2 WT    100.00
#=GS CaM3 WT    100.00
#=GS CaM1 WT    0.10

CaM2  ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD
CaM3  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CaM1  ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD

CaM2  MINEVDADGN..............GTIDFPEFLTMMEAFRVFDKDGNGYI
CaM3  ~~~~~~~~AD..............QLTEEQIAEEIREAFRVFDKDGNGYI
CaM1  MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI

CaM2  SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK
CaM3  SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK
CaM1  SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK
//
strider DNA strider is a simple sequence analysis program designed to run on Mackintosh platforms. The sequence format begins with a comment line denoted by a hash (#) character and the name of the application. Each line above the sequence is denoted by a semi-colon (;). The second line contains the sequence notation,   accession number,   length. A blank line separates this information from the sequence which is displayed as line blocks of fifty bases. The end of the entry is denoted by a double solidus (//) on the line immediately after the last sequence line.
; ### from DNA Strider ;-)
; DNA sequence  BT006818, 450 bases
;
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
ctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttg
gaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcag
gatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccc
cgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaag
aagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttat
atcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaact
aacagatgaagaagtagatgaaatgatcagagaagcagatattgatggag
acggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag
//

swissprot This is the format of the flatfile Swiss-Prot entries. It contains all features concerned with the sequence as they would appear in the Swiss_Prot database. Each line is identified with a tag, representing the data it contains. The sequence is represented by the SQ tag and end of the entry is denoted by double soliduses (//) on the line immediately after the end of the sequence.
ID   CALM_HUMAN     STANDARD;      PRT;   148 AA.
AC   P62158; P02593; P70667; P99014; Q61379; Q61380;
DE   Calmodulin (CaM).
OS   Homo sapiens (Human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
KW   3D-structure; Acetylation; Calcium-binding; Direct protein sequencing;
KW   Methylation; Phosphorylation; Repeat; Ubl conjugation.
SQ   SEQUENCE   148 AA;  16707 MW;  464B8A287475A1CA CRC64;
     ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD MINEVDADGN
     GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGYI SAAELRHVMT NLGEKLTDEE
     VDEMIREADI DGDGQVNYEE FVQMMTAK
//
text See plain format.
treecon A very simple format with the length of the sequence displayed at the top of the entry, followed by the sequence identifier on the next line down. The sequence line is immediately below this with no line breaks.
450
BT006818
atggctgatcagctgaccgaagaacagattgctgaattcaaggaagccttctccctatttgataaagatggcgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactgggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggatgctgatggtaatggcaccattgacttccccgaatttttgactatgatggctagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcattccgagtctttgacaaggatggcaatggttatatcagtgcagcagaactacgtcacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagatgaaatgatcagagaagcagatattgatggagacggacaagtcaactatgaagaattcgtacagatgatgactgcaaaatag

148 CALM_HUMAN ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK

debug This is a verbose format designed for use as a debugging tool. EMBOSS definition tags are listed in alphabetical order and the information contained in the input file written out.
  Name: 'BT006818'
  Accession: 'BT006818'
  Acclist: (1) BT006818

  SeqVersion: 'BT006818.1'
  GI Version: ''
  Description: 'Homo sapiens calmodulin 1 (phosphorylase kinase, delta) mRNA, co
mplete cds.'
  Keywordlist: (1)
    'FLI_CDNA'
  Taxonomy: 'Homo sapiens (human)'
  Taxlist: (13)
    'Homo sapiens (human)'
    'Eukaryota'
    'Metazoa'
    'Chordata'
    'Craniata'
    'Vertebrata'
    'Euteleostomi'
    'Mammalia'
    'Eutheria'
    'Primates'
    'Catarrhini'
    'Hominidae'
    'Homo'
  Type: 'N'
  Database: 'embl'
  Full name: ''
  Date: ''
  Usa: 'bt006818.debug'
  Ufo: ''
  Input format: 'embl'
  Output format: 'debug'
  Filename: 'bt006818.debug'
  Directory: ''
  Entryname: 'BT006818'
  File name: 'bt006818.debug'
  Extension: 'debug'
  Single: 'No'
  Features: 'No'
  Count: 'No'
  Documentation:...

   1  atggctgatc agctgaccga agaacagatt gctgaattca aggaagcctt   50
  51  ctccctattt gataaagatg gcgatggcac catcacaaca aaggaacttg  100
 101  gaactgtcat gaggtcactg ggtcagaacc caacagaagc tgaattgcag  150
 151  gatatgatca atgaagtgga tgctgatggt aatggcacca ttgacttccc  200
 201  cgaatttttg actatgatgg ctagaaaaat gaaagataca gatagtgaag  250
 251  aagaaatccg tgaggcattc cgagtctttg acaaggatgg caatggttat  300
 301  atcagtgcag cagaactacg tcacgtcatg acaaacttag gagaaaaact  350
 351  aacagatgaa gaagtagatg aaatgatcag agaagcagat attgatggag  400
 401  acggacaagt caactatgaa gaattcgtac agatgatgac tgcaaaatag  450

Alignment Formats

EMBOSS currently reads and writes 13 separate alignment formats. This breaks down into 8 different pairwise and 5 different alignment formats. Each alignment output contains a header detailing information on the alignment and the end of entry is denoted by
#---------------------------------------
#---------------------------------------
.

Pairwise Alignment Formats

pair This is the default output for the pairwise alignment modules in your BioBind software. Each sequence is preceded by a name of upto 10 characters long. Numbering is at the ends of each seqeunce line and corresponds to the sequence numbering - thus ignoring gap characters
(-). Identical matches are denoted with a bar (|) and similar matches with a colon (:).
CaM1              12 GCTGA--TCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC     59
                     |||||  |||||||||||||||||||||||||||||||||||||||||||
CaM2              31 GCTGAATTCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC     80

CaM1              60 CATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTGGGTCAGAACC    109
                     ||                   |||||||||||||||||||||||||||||
CaM2              81 CA-------------------AACTGTCATGAGGTCACTGGGTCAGAACC    111

CaM1             110 CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT    159
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             112 CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT    161

CaM1             160 AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT    209
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             162 AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT    211

CaM1             210 GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG    259
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             212 GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG    261

CaM1             260 ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG    309
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             262 ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG    311

CaM1             310 ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG    359
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             312 ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG    361

CaM1             360 AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC    409
                     ||||||||||||||||||||||||||||||||||||||||||||||||||
CaM2             362 AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC    411

CaM1             410 AGATGATGACTGCAAAATAG    429
                     ||||||||||||||||||||
CaM2             412 AGATGATGACTGCAAAATAG    431

CaM1 1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD 50 |||||||||||||||||||||||||||||||||||||||||||||||||| CaM2 1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD 50 CaM1 51 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI 100 ||||||||||||||||::|||| |||||||||||||| CaM2 51 MINEVDADGNGTIDFPDYLTMM--------------EAFRVFDKDGNGYI 86 CaM1 101 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK 148 ||||||||||||||||||||||||||||||||||||||::|||||||| CaM2 87 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYDDFVQMMTAK 134

markx0 Each sequence line is represented with the sequence name as a maximum of six characters separated from the sequence by a whitespace. The markup line is between the sequence lines. An identical match is denoted by a colon (:), a similar match by a dot (.) and a gap by a dash (-). The sequences are numbered in blocks of ten, although the numbering will not necessarily start at 1 depending on where the sequences align. Sequence lines are displayed as blocks of 50 characters.
               10                             20         
  CaM1 ATGGCTGATCAGCTGA---------------------TCAAGGAAGCCTT
       ::::::::::::::::                     :::::::::::::
  CaM2 ATGGCTGATCAGCTGACCGAAGAACAGATTGCTGAATTCAAGGAAGCCTT
               10        20        30        40        50

      30        40        50        60        70         
  CaM1 CTCCCTATTTGATAAAGATGGCGATGGCACCATCACAACAAAGGAACTTG
       ::::::::::::::::::::::::::::::::                  
  CaM2 CTCCCTATTTGATAAAGATGGCGATGGCACCA------------------
               60        70        80                    

      80        90       100       110       120         
  CaM1 GAACTGTCATGAGGTCACTGGGTCAGAACCCAACAGAAGCTGAATTGCAG
        :::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 -AACTGTCATGAGGTCACTGGGTCAGAACCCAACAGAAGCTGAATTGCAG
              90       100       110       120       130 

     130       140       150       160       170         
  CaM1 GATATGATCAATGAAGTGGATGCTGATGGTAATGGCACCATTGACTTCCC
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 GATATGATCAATGAAGTGGATGCTGATGGTAATGGCACCATTGACTTCCC
             140       150       160       170       180 

     180       190       200       210       220         
  CaM1 CGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACAGATAGTGAAG
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 CGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACAGATAGTGAAG
             190       200       210       220       230 

     230       240       250       260       270         
  CaM1 AAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 AAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
             240       250       260       270       280 

     280       290       300       310       320         
  CaM1 ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACT
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACT
             290       300       310       320       330 

     330       340       350       360       370         
  CaM1 AACAGATGAAGAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAG
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 AACAGATGAAGAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAG
             340       350       360       370       380 

     380       390       400       410       420         
  CaM1 ACGGACAAGTCAACTATGAAGAATTCGTACAGATGATGACTGCAAAATAG
       ::::::::::::::::::::::::::::::::::::::::::::::::::
  CaM2 ACGGACAAGTCAACTATGAAGAATTCGTACAGATGATGACTGCAAAATAG
             390       400       410       420       430 

10 20 30 40 50 CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD :::::::::::::::::::::::::::::::::::::::::::::::::: CaM2 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD 10 20 30 40 50 60 70 80 90 100 CaM1 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI ::::::::::::::::..:::: :::::::::::::: CaM2 MINEVDADGNGTIDFPDYLTMM--------------EAFRVFDKDGNGYI 60 70 80 110 120 130 140 CaM1 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK ::::::::::::::::::::::::::::::::::::::..:::::::: CaM2 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYDDFVQMMTAK 90 100 110 120 130

markx1 Similar to the markx0 format, with a mark up line that only reports similar matches (x).
              20        30        40        50         
  CaM1 GCTGA--TCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
                                                         
  CaM2 GCTGAATTCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
               40        50        60        70        80

      60        70        80        90       100         
  CaM1 CATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTGGGTCAGAACC
                                                         
  CaM2 CA-------------------AACTGTCATGAGGTCACTGGGTCAGAACC
                                  90       100       110 

     110       120       130       140       150         
  CaM1 CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
                                                         
  CaM2 CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
             120       130       140       150       160 

     160       170       180       190       200         
  CaM1 AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
                                                         
  CaM2 AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
             170       180       190       200       210 

     210       220       230       240       250         
  CaM1 GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
                                                         
  CaM2 GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
             220       230       240       250       260 

     260       270       280       290       300         
  CaM1 ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
                                                         
  CaM2 ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
             270       280       290       300       310 

     310       320       330       340       350         
  CaM1 ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
                                                         
  CaM2 ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
             320       330       340       350       360 

     360       370       380       390       400         
  CaM1 AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
                                                         
  CaM2 AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
             370       380       390       400       410 

     410       420         
  CaM1 AGATGATGACTGCAAAATAG
                           
  CaM2 AGATGATGACTGCAAAATAG
             420       430 

10 20 30 40 50 CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM2 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD 10 20 30 40 50 60 70 80 90 100 CaM1 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI xx CaM2 MINEVDADGNGTIDFPDYLTMM--------------EAFRVFDKDGNGYI 60 70 80 110 120 130 140 CaM1 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK xx CaM2 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYDDFVQMMTAK 90 100 110 120 130

markx2 Similar to the markx0 format, but only non-identical matches are shown in the second sequence.
CaM1 GCTGA--TCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
CaM2   .....AT...........................................

      60        70        80        90       100         
  CaM1 CATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTGGGTCAGAACC
CaM2   ..-------------------.............................

     110       120       130       140       150         
  CaM1 CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
CaM2   ..................................................

     160       170       180       190       200         
  CaM1 AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
CaM2   ..................................................

     210       220       230       240       250         
  CaM1 GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
CaM2   ..................................................

     260       270       280       290       300         
  CaM1 ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
CaM2   ..................................................

     310       320       330       340       350         
  CaM1 ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
CaM2   ..................................................

     360       370       380       390       400         
  CaM1 AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
CaM2   ..................................................

     410       420         
  CaM1 AGATGATGACTGCAAAATAG
CaM2   ....................

10 20 30 40 50 CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM2 .................................................. 60 70 80 90 100 CaM1 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI CaM2 ................DY....--------------.............. 110 120 130 140 CaM1 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK CaM2 ......................................DD........

markx3 This provides the aligned sequences in fasta format, with a double dot (..) notation after the sequence name on the line directly above the sequence. Characters are displayed in blocks of 50 with gaps denoted by a dash (-) character.
>CaM1 ..
GCTGA--TCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
CATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTGGGTCAGAACC
CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
AGATGATGACTGCAAAATAG
>CaM2 ..
GCTGAATTCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
CA-------------------AACTGTCATGAGGTCACTGGGTCAGAACC
CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
AGATGATGACTGCAAAATAG

>CaM1 .. ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK >CaM2 .. ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPDYLTMM--------------EAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYDDFVQMMTAK

markx10 Markx10 format is annotation markx3 format. Annotation lines start with a semi-colon (;) and the start, end and length of the alignment.
>CaM1 ..
; sq_len: 420
; al_start: 1
; al_stop: 420
; al_display_start: 1
GCTGA--TCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
CATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTGGGTCAGAACC
CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
AGATGATGACTGCAAAATAG
>CaM2 ..
; sq_len: 420
; al_start: 1
; al_stop: 420
; al_display_start: 1
GCTGAATTCAAGGAAGCCTTCTCCCTATTTGATAAAGATGGCGATGGCAC
CA-------------------AACTGTCATGAGGTCACTGGGTCAGAACC
CAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAAT
GAAAGATACAGATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTG
ACAAGGATGGCAATGGTTATATCAGTGCAGCAGAACTACGTCACGTCATG
ACAAACTTAGGAGAAAAACTAACAGATGAAGAAGTAGATGAAATGATCAG
AGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAAGAATTCGTAC
AGATGATGACTGCAAAATAG

>CaM1 .. ; sq_len: 148 ; al_start: 1 ; al_stop: 148 ; al_display_start: 1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK >CaM2 .. ; sq_len: 148 ; al_start: 1 ; al_stop: 148 ; al_display_start: 1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD MINEVDADGNGTIDFPDYLTMM--------------EAFRVFDKDGNGYI SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYDDFVQMMTAK

score This contains no header and displays the two sequence name, sequence length, (score). The score is dependent on the scoring matrix used with the alignment algorithm.
CaM1 CaM2 420 (1887)

CaM1 CaM2 148 (607)

srspair See pair format.

Multiple Alignment Formats

There are currently two multiple alignment formats incorporated into your BioBind software suite. The default setting is the fasta format.
fasta Multiple alignment form of the fasta and default output format. Gaps are denoted by a dash character (-). This type of output is best viewed in your Jemboss Alignment Editor.
>CaM2
ATGGCTGATCAGCTGACCGAAGAACAGATTGCTGAATTCAAGGAAGCCTTCTCCCTATTT
GATAAAGATGGCGATGGCACCAAA-------------------CTGTCATGAGGTCACTG
GGTCAGAACCCAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACA
GATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACTAACAGATGAA
GAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAA
GAATTCGTACAGATGATGACTGCAAAATAG
>CaM3
ATGGCTGATCAGCTGACCGAAGAACAGATTGCTGAATTCAAGGAAGCCTTCTCCCTATTT
GATAAAGATGGCGATGGCACCAT-------------------------------------
GGTCAGAACCCAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACA
GATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACTAACAGATGAA
GAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAA
GAATTCGTACAGATGATGACTGCAAAATAG
>CaM1
ATGGCTGATCAGCTGA---------------------TCAAGGAAGCCTTCTCCCTATTT
GATAAAGATGGCGATGGCACCATCACAACAAAGGAACTTGGAACTGTCATGAGGTCACTG
GGTCAGAACCCAACAGAAGCTGAATTGCAGGATATGATCAATGAAGTGGATGCTGATGGT
AATGGCACCATTGACTTCCCCGAATTTTTGACTATGATGGCTAGAAAAATGAAAGATACA
GATAGTGAAGAAGAAATCCGTGAGGCATTCCGAGTCTTTGACAAGGATGGCAATGGTTAT
ATCAGTGCAGCAGAACTACGTCACGTCATGACAAACTTAGGAGAAAAACTAACAGATGAA
GAAGTAGATGAAATGATCAGAGAAGCAGATATTGATGGAGACGGACAAGTCAACTATGAA
GAATTCGTACAGATGATGACTGCAAAATAG

>CaM2 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN --------------GTIDFPEFLTMMEAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK >CaM3 ----------------------------------------------------------AD --------------QLTEEQIAEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK >CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGN GTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEE VDEMIREADIDGDGQVNYEEFVQMMTAK

msf Originally designed as multiple alignment format, the initial information is similar to gcg format. The start of the entry is denoted by a double bang (!!) followed by a definition of the sequence type - nucleotide (NA) or protein (AA). The description of the multiple file format is on the line immediately below with the sequence length, sequence type, creation date and a checksum relating to the alignment. The next line contains the identifiers of each sequence in the alignment followed by their length, and a checksum based on that particular sequence only. Names may contain up to 14 characters and no blank spaces. Lengths must be equal. Gaps are denoted by a tilda (~). This information is separated from the alignment by double soliduses (//) marks. The checksum is a value based on the sequence, and if the sequence is manually altered, the checksum will no longer tally with the sequence and it will be rejected as an input file. The sequence is displayed on each line as a block of fifty and aligned sequences are directly under one another.
!!NA_MULTIPLE_ALIGNMENT 1.0

   MSF: 431 Type: N 28/08/04 CompCheck: 2996 ..

  Name: CaM1       Len: 431  Check: 2743 Weight: 1.00
  Name: CaM2       Len: 431  Check: 8399 Weight: 1.00
  Name: CaM3       Len: 431  Check: 1854 Weight: 1.00

//

           1                                               50
CaM1       atggctgatcagctgatcaaggaagccttctccctatttgataaagatgg
CaM2       atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt
CaM3       atggctgatcagctgaccgaagaacagattgctgaattcaaggaagcctt

           51                                             100
CaM1       cgatggcaccatcacaacaaaggaacttggaactgtcatgaggtcactgg
CaM2       ctccctatttgataaagatggcgatggcaccaaactgtcatgaggtcact
CaM3       ctccctatttgataaagatggcgatggcaccatggtcagaacccaacaga

           101                                            150
CaM1       gtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtggat
CaM2       gggtcagaacccaacagaagctgaattgcaggatatgatcaatgaagtgg
CaM3       agctgaattgcaggatatgatcaatgaagtggatgctgatggtaatggca

           151                                            200
CaM1       gctgatggtaatggcaccattgacttccccgaatttttgactatgatggc
CaM2       atgctgatggtaatggcaccattgacttccccgaatttttgactatgatg
CaM3       ccattgacttccccgaatttttgactatgatggctagaaaaatgaaagat

           201                                            250
CaM1       tagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcattcc
CaM2       gctagaaaaatgaaagatacagatagtgaagaagaaatccgtgaggcatt
CaM3       acagatagtgaagaagaaatccgtgaggcattccgagtctttgacaagga

           251                                            300
CaM1       gagtctttgacaaggatggcaatggttatatcagtgcagcagaactacgt
CaM2       ccgagtctttgacaaggatggcaatggttatatcagtgcagcagaactac
CaM3       tggcaatggttatatcagtgcagcagaactacgtcacgtcatgacaaact

           301                                            350
CaM1       cacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagatga
CaM2       gtcacgtcatgacaaacttaggagaaaaactaacagatgaagaagtagat
CaM3       taggagaaaaactaacagatgaagaagtagatgaaatgatcagagaagca

           351                                            400
CaM1       aatgatcagagaagcagatattgatggagacggacaagtcaactatgaag
CaM2       gaaatgatcagagaagcagatattgatggagacggacaagtcaactatga
CaM3       gatattgatggagacggacaagtcaactatgaagaattcgtacagatgat

           401                         431
CaM1       aattcgtacagatgatgactgcaaaatag~~
CaM2       agaattcgtacagatgatgactgcaaaatag
CaM3       gactgcaaaatag~~~~~~~~~~~~~~~~~~

!!AA_MULTIPLE_ALIGNMENT 1.0 MSF: 148 Type: P 28/08/04 CompCheck: 9511 .. Name: CaM1 Len: 148 Check: 2160 Weight: 1.00 Name: CaM2 Len: 148 Check: 1261 Weight: 1.00 Name: CaM3 Len: 148 Check: 6090 Weight: 1.00 // 1 50 CaM1 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM2 ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD CaM3 ADQLTEEQIAEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVD 51 100 CaM1 MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI CaM2 MINEVDADGNGTIDFPEFLTMMEAFRVFDKDGNGYISAAELRHVMTNLGE CaM3 EMIREADIDGDGQVNYEEFVQMMTAK~~~~~~~~~~~~~~~~~~~~~~~~ 101 148 CaM1 SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK CaM2 KLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK~~~~~~~~~~~~~~ CaM3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

©2005 BioBind.com All rights reserved.