![]() |
|
|
|
![]() |
||
| >Home | >FAQ | >EMBOSS Guide |
GCG to EMBOSS ConversionEMBOSS was designed in 1999 by some of the EGCG developers with many of the same principles that can be found in the GCG package. The EMBOSS suite is designed as a collection of programs that can be used together to create a flexible analysis pipeline. There are no arbitrary size limits on sequence length and as EMBOSS reads and writes 42 different formats, it is easy to input and output files created from or for other software packages. For those GCG users who have started using the BioBind product, this conversion chart should help you make the switch as easily and efficiently as possible. You will find many equivalences between the two software packages, although you currently will also find areas where there is no GCG equivalent for an EMBOSS program and vice versa. Each EMBOSS program contains a one line description to identify it. This description appears at the start of each application. EMBOSS applications will prompt for all information required for it to execute. Additional prompts may be accessible using the -opt flag after the program name on the command line. The default option is displayed in square brackets ([ ]). To accept it, hit <return> otherwise type in whatever you like. Output file names often have as their suffix the name of the relevant application. To halt a program in EMBOSS hit ctrl C. |
| assemble |
merger
Merges two overlapping sequences into one. Produces a merged file and an alignment file. Matrix options accessible using the -opt flag. |
$ merger Merge two overlapping nucleic acid sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Output sequence [cam1.fasta]: cam_both.fasta Output alignment [cam1.out2]: cam_both.aln |
| backtranslate |
backtranseq
Translates protein back into a nucleotide sequence. Default codon usage table is the standard human one. To alter this use the -opt flag. |
$ backtranseq Back translate a protein sequence Input sequence: calm_human Output sequence [calm_human.fasta]: |
| bestfit |
water/matcher
Finds the best local alignment(s) between two sequences. matcher (Huang & Miller algorithm) provides a faster match and should be used for longer sequences. water (Smith-Waterman algorithm) is more accurate and should be used for shorter sequences. Matrix options for matcher are available using the -opt flag |
$ matcher Finds the best local alignments between two sequences Input sequence: cam1_long.fasta Second sequence: cam2_long.fasta Output alignment [cam1_1-429.matcher]: |
| breakup |
splitter
Takes a sequence and splits it into smaller overlapping sequences. Use the -opt flag to select the size of each fragment. |
$ splitter Split a sequence into (overlapping) smaller sequences Input sequence(s): cam1.fasta Output sequence [cam1.fasta]: |
| chopup | It is not necessary to have a separate program in EMBOSS for this, as all programs read and write a number of different file formats. | |
| codonfrequency |
chips/cusp/compseq
chips calculates the effection number of codons used (Wright Nc statistic). cusp creates a codon usage table from coding sequence (CDS). compseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification. |
$ chips Codon usage statistics Input sequence(s): cam1.fasta Output file [cam1_1-429.chips]: |
| codonpreference |
syco/wobble
syco identifies coding sequence from codon frequency bias information (Gribskov statistic). Further options for plot specification can be retrieved using the -opt flag. wobble plots a graph of the third "wobble" codon in a sequence. Use the -opt flag to alter the window size. |
$ syco Synonymous codon usage Gribskov statistic plot Input sequence: cam1.fasta Graph type [x11]: ps Created syco.ps |
| coilscan |
pepcoil
Identifies coiled coil regions in a protein sequence (Lupas, van Dyke & Stock algorithm). |
$ pepcoil Predicts coiled coil regions Input sequence(s): calm_human Window size [28]: Output file [calm_human.pepcoil]: |
| compare |
dottup/dotmatcher
Comparison of similar regions across two sequences displayed in graphcal format. dottup is designed for identical matches, and dotmatcher for regions of similarity. Use the -opt flag to select matrix options. |
$ dottup Displays a wordmatch dotplot of two sequences Input sequence: cam1.fasta Second sequence: cam2.fasta Word size [10]: Graph type [x11]: ps Created dottup.ps |
| composition |
compseq/pepstats
compseq counts the composition of user-specifed words within the sequence. Use the -opt flag for further word specification. pepstats calculates peptide sequence composition. |
$ compseq Counts the composition of dimer/trimer/etc words in a sequence Input sequence(s): cam1.fasta Word size to consider (e.g. 2=dimer) [2]: Output file [cam1_1-429.composition]: |
| consensus |
prophecy
Creates a matrix or profile from a multiple alignment. |
$ prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: prot2.fasta
Profile type
F : Frequency
G : Gribskov
H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [prot2.prophecy]:
|
| correspond |
codecmp
Compares codon frequency matrices. |
$ codcmp Codon usage table comparison Codon usage file [Ehum.cut]: Second Codon usage file [Ehum.cut]: Eacc.cut Output file [outfile.codcmp]: |
| corrupt |
msbar
Randomly mutates a sequence. Use the -opt flag to mutate in frame. |
$ msbar
Mutate sequence beyond all recognition
Input sequence(s): cam1.fasta
Number of times to perform the mutation operations [1]:
Point mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of point mutations to perform [0]:
Block mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of block mutations to perform [0]:
Codon mutation operations
0 : None
1 : Any of the following
2 : Insertions
3 : Deletions
4 : Changes
5 : Duplications
6 : Moves
Types of codon mutations to perform [0]:
Output sequence [cam1_1-429.fasta]:
|
| dataset |
dbiblast/dbigcg/dbifasta/dbiflat
Indexes the relevant database for use with EMBOSS. |
|
| distances |
no direct equivalent
See the PHYLIP package offered as part of your BioBind software options. |
|
| diverge |
no direct equivalent
See the PHYLIP package offered as part of your BioBind software options. |
|
| dotplot |
dottup/dotmatcher
See comparison |
|
| extractpeptide |
transeq
Translates a nucleotide sequence into protein. Use the -opt flag to specify information on the region, frame and genetic code. |
$ transeq
Translate nucleic acid sequences
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.pep]:
|
| fetch |
seqret/seqretsplit
seqret retrieves sequences from a database using the EMBOSS uniform sequence address. It can also by used with an input file to alter its format. seqretsplit splits a multi-sequence files into individual files containing a single sequence.Use the -opt flag to retrieve only the first sequence in a file. |
$ seqretsplit
Reads and writes (returns) sequences in individual files
Input sequence(s): prot2.fasta
Output sequence [calm_human.fasta]:
|
| findpatterns |
fuzznuc/fuzzpro
Fuzzy search of a pattern against a sequence on selection of sequences. Search allows mismatches. fuzznuc searches nucleotide and fuzzpro protein sequences. |
$ fuzznuc Nucleic acid pattern search Input sequence(s): cam1.fasta Search pattern: AGGT Number of mismatches [0]: 1 Output report [cam1_1-429.fuzznuc]: |
| frames |
plotorf/showorf
Plots or displays open reading frames. plotorf uses ATG as a start and TAA, TAG, TGA as stop codons and displays the results as a graphic. showorf writes out the results of a frame translation as text. Use the -opt flag for more options. |
$ plotorf Plot potential open reading frames Input sequence: cam1.fasta Graph type [x11]: ps Created plotorf.ps |
|
from EMBL fromFasta fromGenbank fromIG fromStaden fromtrace |
all
All EMBOSS applications read and write a variety of file formats, so an individual conversion program is not necessary. |
|
| gap |
stretcher/needle
Finds the best global alignment between two sequences. stretcher (Myers & Miller algorithm) provides a faster match and should be used for longer sequences. needle (Needleman-Wunsch algorithm) is more accurate and should be used for shorter sequences. Matrix options for stretcher are available using the -opt flag |
$ stretcher Finds the best global alignment between two sequences Input sequence: cam1_long.fasta Second sequence: cam2_long.fasta Output alignment [cam1_1-429.stretcher]: |
| gapshow |
plotcon
Plots the quality of alignment conservation across a sliding window. Use the -opt flag to alter the comparison matrix. |
$ plotcon
Plots the quality of conservation of a sequence alignment
Input sequence set: emma.aln
Window size [4]:
Graph type [x11]: ps
Created plotcon.ps
|
| getseq |
newseq
Enter a short sequence into the program for use as an input file in other applications. |
$ newseq
Type in a short new sequence.
Name of the sequence: Test
Description of the sequence: Test Protein Sequence
Type of sequence
N : Nucleic
P : Protein
Type of sequence [N]: P
Output sequence [outfile.fasta]: Test.fasta
Enter the sequence: wearethediddymenthediddymenthediddymen
|
| growtree |
no direct equivalent
Use emma as the interface to ClustalW or the PHYILP option on your BioBind software. |
|
| helicalwheel |
pepwheel
Plots a protein sequence as a helix.Use the -opt flag to specify the output display. |
$ pepwheel
Shows protein sequences as helices
Input sequence: calm_human
Graph type [x11]: ps
Created pepwheel.ps
|
|
hmmerAlign hmmerBuild hmmerCalibrate hmmerFetch hmmerIndex hmmerPfam hmmerSearch |
no direct equivalent
The HMMER programs are available as an option with your BioBind software. |
|
| hthscan |
helixturnhelix
Searches for 22 residue helix turn helix motifs in a protein sequence (Dodd & Egan).Use the -opt flag to search using their 20 residue region and further specify calculation parameters. |
$ helixturnhelix
Report nucleic acid binding motifs
Input sequence(s): calm_human
Output report [calm_human.hth]:
|
| isoelectric |
iep
Calculates the isoelectric point of a protein. |
$ iep calm_human
Calculates the isoelectric point of a protein
Output file [calm_human.iep]:
|
| lookup |
whichdb
Does not offer all the parameters that lookup does, but will find identifers or acccession numbers in a database, and optionally retrieve the sequence. |
$ whichdb
Search all databases for an entry
ID or Accession number: p62158
Output file [outfile.whichdb]:
Output file [cam1_1-429.restover]:
|
|
map / mapplot / mapsort |
restrict/remap/restover
Calculates restriction maps based on the entries in the REBASE restriction enzyme database. Displays peptide translation of open reading frame. remap is the most felxible of these applications. Use the -opt flag to force specific cutters. |
$ restrict cam1.fasta Finds restriction enzyme cleavage sites Minimum recognition site length [4]: Comma separated enzyme list [all]: Output report [cam1_1-429.restrict]: |
| melttemp |
dan
Calculates the melting temperature of a DNA or RNA sequence (Breslauer and Baldino statistics). Use the -opt flag to further specify calculations. |
$ dan
Calculates DNA RNA/DNA melting temperature
Input sequence(s): cam1.fasta
Enter window size [20]:
Enter Shift Increment [1]:
Enter DNA concentration (nM) [50.]:
Enter salt concentration (mM) [50.]:
Output report [cam1_1-429.dan]:
|
| MEME |
no direct equivalent
See the MEME application included as an option in your BioBind software. |
|
| moment |
hmoment
Calculates the hydrophobic moment of protein. Use the -opt flag to specify the angle of rotation. |
$ hmoment
Hydrophobic moment calculation
Input sequence(s): calm_human
Output file [calm_human.hmoment]:
|
| motifs |
patmatmotifs/pscan
patmatmotifs searches the PROSITE database for patterns. Use the -opt flag to specify patterns. pscan searches the PRINTS database for fingerprint motifs. |
$ patmatmotifs Search a PROSITE motif database with a protein sequence Input sequence: calm_human Output report [calm_human.patmatmotifs]: |
| names |
infoseq
Describes sequence attributes such as name, length, GC content. |
$ infoseq
Displays some simple information about sequences
Input sequence(s): calm_human
# USA Name Accession Type Length Description
fasta::calm_human:CALM_HUMAN CALM_HUMAN P62158 P 148 Calmodulin (CaM).
|
| nooverlap |
diffseq
Finds differences between two sequences. Use the -opt flag to output the information in columns. |
$ diffseq
Find differences between nearly identical sequences
Input sequence: cam1.fasta
Second sequence: cam2.fasta
Word size [10]:
Output report [cam1_1-429.diffseq]:
Output features [CaM1_1-429.diffgff]:
Second output features [CaM2.diffgff]:
|
| pepdata |
getorf/sixpack
Translates all six open reading frames. getorf displays selected translations. sixpack displays DNA sequence and peptide translation. Use the -opt flag for either program to specify the codon usage information. |
$ getorf Finds and extracts open reading frames (ORFs) Input sequence(s): cam1.fasta Output sequence [cam1_1-429.orf]: |
| pepplot |
pepinfo + garnier
pepinfo displays biophysical properties of the protein sequence and plots hydrophobicity (Kyte & Doolittle, Sweet & Eisenberg, Eisernberg). Use the -opt flag to select parameters for the hydrophobicity plots. garnier displays a secondary structure plot (Garnier, Ogusthorpe & Robson) |
$ pepinfo Plots simple amino acid properties in parallel Input sequence: calm_human Graph type [x11]: ps Output file [calm_human.pepinfo]: Created pepinfo.ps |
| peptidemap |
digest
Peptide full or partial digest of a protein sequence. |
$ digest
Protein proteolytic enzyme or reagent cleavage digest
Input sequence: calm_human
Enzymes and Reagents
1 : Trypsin
2 : Lys-C
3 : Arg-C
4 : Asp-N
5 : V8-bicarb
6 : V8-phosph
7 : Chymotrypsin
8 : CNBr
Select number [1]:
Output report [calm_human.digest]:
|
|
peptidestructure / plotstructure |
garnier
Displays secondary structure plot (Garnier, Ogusthorpe & Robson) |
$ garnier
Predicts protein secondary structure
Input sequence(s): calm_human
Output report [calm_human.garnier]:
|
| pileup |
emma
Wrapper to the ClustalW multiple sequence alignment program. Accepts all EMBOSS input formats. |
$ emma
Multiple alignment program - interface to ClustalW program
Input sequence(s): prot_all.fasta
Output sequence [cam2.aln]:
Dendogram output filename [cam2.dnd]:
CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence type explicitly set to Protein
Sequence format is Pearson
Sequence 1: CaM2 148 aa
Sequence 2: CaM3 148 aa
Sequence 3: CaM1 148 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 93
Sequences (1:3) Aligned. Score: 100
Sequences (2:3) Aligned. Score: 97
Guide tree file created: [00002524C]
Start of Multiple Alignment
There are 2 groups
Aligning...
Group 1: Sequences: 2 Score:2070
Group 2: Sequences: 3 Score:1098
Alignment Score 1439
GCG-Alignment file created [00002524B]
|
| plasmidmap |
lindna/cirdna
Display of linear and circular DNA.
Also, the Jemboss DNA Editor is a graphical interface to |
|
| plotsimilarity |
plotcon
See gapshow |
|
|
pretty / prettybox |
cons/prettyplot/showalign
cons calculates a consensus from a multiple alignment using specified parameters. prettyplot displays an alignment with specified colours and boxed in display. showalign displays the alignment in editable text format. Use the -opt flag for all three programs to set values. These programs are all incorporated in the Jemboss Alignment Editor together with additional capabilities. |
|
| prime |
eprimer3
Included in your BioBind software. Allows selection of a variety of different primers under several conditions. Use the -opt flag to alter parameters. |
$ eprimer3
Picks PCR primers and hybridization oligos
Input sequence(s): cam3.fasta
Output file [cam3.eprimer3]:
|
|
profilegap / profilemake |
prophet/prophecy
prophecy creates matrices or profiles from multiple alignments. prophet reads in these files to create gapped alignment of proteins. |
$ prophecy
Creates matrices/profiles from multiple alignments
Input sequence set: emma.aln
Profile type
F : Frequency
G : Gribskov
H : Henikoff
Select type [F]:
Enter a name for the profile [mymatrix]:
Enter threshold reporting percentage [75]:
Output file [emma.prophecy]:
|
| profilescan |
patmatdb
Uses a motif to search a protein sequence. |
$ patmatdb
Search a protein sequence with a motif
Input sequence(s): emma.aln
Protein motif to search for: HATS
Output report [cam2.patmatdb]:
|
| profilesearch |
profit
Scans a sequence or database with a matrix or profile. Uses the matrix file created by prophecy. |
$ profit
Scan a sequence or database with a matrix or profile
Profile or matrix file: emma.prophecy
Input sequence(s): calm_human
Output file [emma.profit]:
|
| reformat |
seqret
Reformatting files is redundant in EMBOSS as each application reads and write a variety of different formats. However, if anything needs converting, seqret will do it. |
$ seqret
Reads and writes (returns) sequences
Input sequence(s): calm.gcg
Output sequence [calm_human.fasta]:
|
| repeat |
equicktandem/etandem/einverted/palindrome
Searches for tandem repeats, inverted or palindromic sequences in a nucleotide input file. |
$ equicktandem Finds tandem repeats Input sequence: cam1.fasta Maximum repeat size [600]: Threshold score [20]: Output report [cam1_1-429.qtan]: |
| replace |
biosed/degapseq
biosed replaces specified characters in a text file. degapseq is specific for removing gaps. |
$ biosed Replace or delete sequence sections Input sequence(s): cam1.fasta Sequence section to match [N]: Replacement sequence section [A]: Output sequence [cam1_1-429.fasta]: |
| reverse |
revseq
Reverses and complements a sequence. Almost any program in the suite can reverse and complement a sequence using the -reverse option. Alternatively the [start:end:reverse] syntax will accomplish the same task. |
$ revseq
Reverse and complement a sequence
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.rev]:
|
| sample |
extractseq
Extracts specific regions from a sequence. Use the -opt flag to save them to a separate file. |
$ extractseq
Extract regions from a sequence
Input sequence: cam1.fasta
Regions to extract (eg: 4-57,78-94) [1-429]: 1-25
Output sequence [cam1_1-429.fasta]:
|
| seg |
maskseq
Masks low complexity regions within a sequences. Use the -opt flag to select a region to mask. |
$ maskfeat
Mask off features of a sequence.
Input sequence(s): cam1.fasta
Output sequence [cam1_1-429.fasta]:
|
| shuffle |
shuffleseq
Shuffles one or a set of sequences. |
$ shuffleseq
Shuffles a set of sequences maintaining composition
Input sequence(s): calm_human
Output sequence [calm_human.fasta]:
|
| spscan |
sigcleave
Searches for signal sequences in proteins. Use the -opt flag to specify a prokaryotic sequence. |
$ sigcleave
Reports protein signal cleavage sites
Input sequence(s): calm_human
Minimum weight [3.5]:
Output report [calm_human.sig]:
|
| stemloop |
etandem/palindrome
See repeat |
|
| testcode |
wobble
See codonpreference |
|
|
toFASTA toPIR toIG toSTADEN |
seqret
See fromEMBL |
|
| translate |
transeq
See extractpeptide. |
|
| window + statplot |
freak
Calculates the base or residue frequency of a sequence. Use the -opt flag to select the window type for calculation of the plot. |
$ freak
Residue/base frequency table or plot
Input sequence(s): cam1.fasta
Residue letters [gc]:
Output file [cam1_1-429.freak]:
|
| gcghelp |
tfm
Stands for "the fine manual" and contains the indivudal program documentation. Type tfm followed by the program name. |
$ tfm stretcher
Displays a program's help documentation manual
|
| ©2005 BioBind.com All rights reserved. |