 Block Formatter Help
Block Formatter Help
Block formatter will format 
a multiple sequence alignment to the 
block format used in the 
Blocks Database. 
You can obtain multiple sequence alignments by various methods. 
A number of 
programs for multiple alignment are available on the WWW. 
One such program is Block Maker. 
It finds ungapped local multiple alignments (blocks) in groups of related 
protein sequences. The BlockMaker output is already in the block format.
The minimal input is the 
aligned sequence segments, 
all the other data can be found or given arbitrary values.
The header fields
- The ID field is a short 
identifier for sequences in the alignment.
 Example - "Phosphorylases".
 
- The Accession field is for the block name
. It should be 7 or 8 characters long. It is recommended that the 
first 7 characters will designate the block family or group and the last 
character will be used to number different blocks in that group 
(using upper case letters A-Z).
 Examples - "TR1421_A", "TR1421_B", "Transpo".
 
- The minimal and maximal distance from the 
previous block (or from the beginning of the sequences if this is 
the first block) is the number of amino acids between the block and the 
previous one (or sequences start).
 Example - "5", "34".
 
- The Description field is for the 
description of the group of sequences from which the 
block was made.
 Example - "'Homeobox' domain proteins".
 
- The Alignment method should shortly describe 
how the alignment was made or found.
 Examples - "Manual alignment", "MACAW", "ClustalW", "Prot. Sci. 12 p345, 
            87'".
 
- The alignment width specifies how 
wide (or long) is the alignment.
 Example - "32".
 
- The number of sequences is how many 
sequences are in the alignment.
 Example - "11".
 
All the previous fields are optional. Their values could be 
given by the formatting program (either by default or from the multiple 
alignment).
The multiple alignment fields
- The sequence names should not be 
longer than 10 characters and be unique.
 Examples - "RECA_ECOLI", "S67853_A", "PCR#543"
 
- The positions 
of the aligned sequence segment are their offset from the 
begining of the sequence. Every position should be in a separate line. 
Avoid empty lines.
 Example - "67"
The names and sequence position of each aligned sequence segment in the 
multiple alignment will be given by the program if nothing
is entered in these fields.
Examples 
FASTA format -
>vde_yeast vacuolar ATPAse 19 aa
DYYGIT
LSDDSD
HQFLLAN
>vde_cantr another yeast
1  NYYGITLAE
10 ETDHQFLLS
19 N
>reci_myctu (sequence can be in lower or upper case)
r a r t f
d l e v e
e l h t l
v a e g 
>reci_mycle (sequence can be on one line or more)
SMNRFDIEVEGNHNYFVDG
>dpi1_theli (only first word in this line is read as sequence name)
EGYVYDLSV
EDNENFLVGF
>dpi2_theli (header lines MUST start with a '>')
EGYV
YDIE
VEET
HRFF
ANN
basic format -
DYYGITLSDDSDHQFLLAN
NYYGITLAEETDHQFLLSN
RARTFDLEVEELHTLVAEG
SMNRFDIEVEGNHNYFVDG
EGYVYDLSVEDNENFLVGF
EGYVYDIEVEETHRFFANN
An example of a possible output -
ID   Inteins; BLOCK
AC   vde_yea; distance from previous block = (98,190)
DE   Protein introns (inteins).
BL   gibbs; width=19; seqs=6;
 vde_yeast ( 430) DYYGITLSDDSDHQFLLAN  90
 vde_cantr ( 447) NYYGITLAEETDHQFLLSN  84
reci_myctu ( 417) RARTFDLEVEELHTLVAEG 100
reci_mycle ( 342) SMNRFDIEVEGNHNYFVDG  95
dpi1_theli ( 513) EGYVYDLSVEDNENFLVGF  83
dpi2_theli ( 367) EGYVYDIEVEETHRFFANN  74
//
[Blocks home] 
[Block Searcher]
[Block Maker]
[Get Blocks]
[LAMA Searcher]
[Block formatter]
Page last modified August 1996