- Data format
Data should be in fasta format. If more than two sequences are submitted, only
the first two sequences will be processed.
The only nucleotides allowed are A, C, G, T, U, N, -.
The N and - are treated as gaps. In the output T is printed as U.
Fasta format for scanning
Search for multiple local hits. Max sequence length 500 and Max motif length
(lambda) 200.
>AE000516/4118797-4118870
GGCUCGAUCCGGCCGGGCCGGCUGCGGCGCUGGUGGAAUUCGAGCGCUCCUGUCGAUGGA
CGUGACGGUCGGACCUGCGGUUUGGCUAGUCAACGGUCCGGUGCGAUAGGCUGUCGUGGC
UUCAAGCGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGG
UUCAAAUCCCGCCACCCCGACCGAGAGAUCGCUGACGACAGCCUUACCCGGCGCAGCGUG
GUAGCUUGCUGCAGUCUGCUCGGGCGGCAGCGCCACCCUGACGGUGCUGGUUGACCAUGC
CGGACAGCACGUCAACGCACAGGCAUUUCCAACGGAAGUUGUAGGUUACCGGCCGCCCUA
AAACACGGUGCACUUUUCGUUAAAGGUUGUGGGUGUGGAUCCAACGAAAUUCGUUGCCCC
GGCGUGGGCAGCGCCGUGUCCACAGGGGGACCCGCCGCGCAUUACGCCUAUGGGCCCACC
CCCGUACCGCGGGAGUUGGC
>AE000051/8942-9018
UAAAUUUAAGUAUUAAGUAUUAAUUAAACGGGAAAGGAAUAGGACGAUUUACCAGGGGUU
AAACCUAAUCGGAACCGACCGGUACUAAAGUAGUCUUUUUGUGAAUUCCUUCUUUUAUAG
CUAUAACUGCUGCCAUUGCCGUUGGCAUUAUUAGCGUUAGCAGCUUCGCUGAUGUCUAAC
GCGGUAUUAAUUAACGUACUAAAGGCGUUUACAAUCAUAAUUACUCCCGCACUGAAGCUU
CAAAAUGAUAAGCCUCCAACUACUUGUUUAGCUUCUGAUUUUUUUAAUCUCUGCAUUAUU
GUGUCUCCAUUUAAAUGAAUUACAUAACUAAAGUAAAAUGCCAAAGUUUUAUAAUUAACU
UAACUGUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGU
UCGAUUCCUGUUGGUGACGCCAUAAUACUUUCUAACCUACCAGUGUUACCCUGGUAGGUU
UUUUAUUUGCUCCGUUGGCU
Fasta format for local and global alignment
Local or global alignment of the sequences.
Max sequence length is 200.
The motif can be as long as the sequences.
For global alignment the
delta
parameter has to be larger than the length difference between the sequences
>AE000516/4118797-4118870
CGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGGUUCAAA
UCCCGCCACCCCGA
>AE000051/8942-9018
GUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGUUCGAU
UCCUGUUGGUGACGCCA
- Parameters
Type of comparison. Choices: Scan, Local, or Global
These are the three different server types.
Scan makes a local foldalignment between the two input sequences and
reports a ranked list of local foldalignment hits. The input sequences can be
long, up to 500 nucleotides, but the length of the foldalignments is limited to
Maximum motif length lambda nucleotides. A scan can take
several hours to
complete depending on the length of the sequences and the parameters used.
An example of the output can be
seen
here.
Local makes a local foldalignment. It can be as long as the input
sequences. Only one foldalignment is reported. The input sequences are limited
to 200 nucleotides. For short sequences a local foldalignment takes seconds or
minutes. For long sequences it can take up to an hour.
An example of the output can be
seen
here.
Global makes a global foldalignment. The sequences are foldaligned from
end to end. The length of the input sequences is limited to 200 nucleotides.
The maximum length difference delta parameter must be larger than or
equal to the length difference between the sequences. For short sequences a
global foldalignment takes
seconds or minutes. For long sequences it can take up to an hour. An
example of the output can be seen
here.
Email
If an email address is specified, an email will be sent to the address when the
alignment is done. The email will contain a link to the results page. The e-mail
address will not be used for anything else.
Comment/ID
This field can be used for marking and/or tracking different submissions.
Maximum length difference (delta). Ranges: Scan 1-15. Local and global
1-25.
A unconstrained version of FOLDALIGN takes a very long time to run. It also
demands a lot of memory. To lower the time and memory requirements of the
algorithm delta limits the length difference between subsequences being
compared. This for example means that the alignment of a ten nucleotide
subsequence and a 60 nucleotide subsequence will not be considered. For
structures that are similar this is not a problem, but if there are long
inserts in one of the sequences, then the correct structure might not be
found.
Gap opening cost. Range: -1000 -> 0
This is the cost of opening a new gap. We have found that a gap opening of
around -50 gives good results when scanning. For local and global
foldalignments the gap opening cost is dependend on the RNA type. If it is not
known what kind of RNA the sequences contain, then try several values. The
range we have found useful, is -10 -> -100.
Gap elongation cost. Range: -1000 -> 0
This is the cost of elongating an already started gap. We usually fix this
value at half the gap opening cost.
Maximum motif length (lambda). Range: 1 - 200. Only effects scan type
comparisons.
This is the maximum length of a foldalignment not counting gaps. The use of
this parameter makes it in theory possible to scan sequences of any
biologically relevant length on an ordinary desktop computer. But to limit the
resources needed by the server the sequence length has been limited. The time
needed to run FOLDALIGN is greatly affected by the value of lambda.
Unfortunately lambda and
Lambda are two completely different
things. They should not be mistaken for each other.
Maximum number of structures. Range: 1 - 10. Only effects scan type
comparisons.
This is the maximum number of structures reported by the server. For each
server FOLDALIGN has to realign and backtrack the foldalignment. The position of
these alignments are drawn as bars on the
Z-score plot.
- Web-server output examples
Examples of the output from the three different types of comparisons. The
sequences used are those from the Data format
section. The parameters used are the default parameters. The archive file
available for download on the result page contains among other files a file
named index.html which is a copy of the result page.
- Scan
- Local
- Global
- Z-score plot

A Z-score plot shows the score of the best scoring alignment starting at
position (i,k). The
Z-score is calculated from the local alignment scores
produced during the alignment. True alignments will often show up as big
blotches on the plot. Along the sides of the plot bars indicating the best
alignments can be seen. Bars are drawn for the alignment hits for which the
structure predictions are also printed. This is controlled by the Maximum number of structures (nh) parameter. The bar for the
highest scoring alignment is colored dark blue. Any other alignment bars has a
light blue color. Since the alignments can overlap on one of the sequences,
start (yellow color) and end (red color) positions have been indicated. The
coordinates are listed in the list of hits below the figure in the web server
output.
- Z-score
The Z-score is defined as:
Z = (FA - mFA)/sFA
FA is the FOLDALIGN score.
mFA is the mean FOLDALIGN score.
sFA is the spread of the FOLDALIGN score.
The scores are found in the last column of the LS lines of the output
files. When running FOLDALIGN on the command-line, option -plot_score has
to be used to get the LS lines.
- P-value
The P-value is an estimate of the probability that a foldalignment with a given
score would be found chance.
The P-value depends on the Lambda and K which are the parameters
of the extreme value distribution. Please note that this Lambda is
not the same as the FOLDALIGN
lambda.
Lambda and K are reported on the web page.
To estimate Lambda and K the sum of the scores
(Island sum) of all non-overlapping hits with a score above a threshold
C is used.
Island count shows how many scores have been used to estimate the
value of Lambda and K.
Do not trust the P-value to much. Non-random foldalignments will bias the
estimate. The number of scores used in the estimates is most often very low.
Border effects have also not been taken into account.
- NS score
The No single strand Substitution score is the standard FOLDALIGN score minus
the sequence similarity score for the single stranded regions of the structure.
- FOLDALIGN output file format
Contents
-
Header section
-
Sequence section
-
Local alignment scores
FOLDALIGN produces output in
col format. The format aims at keeping information about data,
parameters, and results in one file. At the same time the data has to be in a
format which is easy to work with.
The default FOLDALIGN output has three sections. The first part, the header,
holds some general information and the structure of the best foldalignment
found. The second section holds general information and information about the
first sequence. The third section is similar to the second, but contains
information about the second sequence.
- The header section
A typical standard header looks like this:
; FOLDALIGN 2.0.1
; REFERENCE JH. Havgaard, R. Lyngsų, GD. Stormo, J. Gorodkin
; REFERENCE Pairwise local structural alignment of RNA sequences
; REFERENCE with sequence similarity less than 40%
; REFERENCE In press Bioinformatics 2005
; ALIGNMENT_ID Structure 1
; ALIGNING V00158/694-623 against AC069454/82192-82263
; ALIGN V00158/694-62
; ALIGN AC069454/8219
; ALIGN Score: 561
; ALIGN Identity: 39 % ( 28 / 72 )
; ALIGN Begin
; ALIGN
; ALIGN V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
; ALIGN Structure (((((((..( (((....... .)))).(((( (.......))
; ALIGN AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
; ALIGN
; ALIGN V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
; ALIGN Structure ))).....(( (((....... )))))))))) ))
; ALIGN AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
; ALIGN
; ALIGN End
; ==============================================================================
; FOLDALIGN 2.0.1
This field indicates which version
of FOLDALIGN was used to produce the file.
; REFERENCE
These fields contain citation information.
; ALIGNMENT_ID
Holds a comment either set by the
user or the web server. The default value is n.a.
; ALIGNING V00158/694-623 against AC069454/82192-82263
The name of sequence one against name of sequence two
; ALIGN V00158/694-62
Name and comment for the
first sequence. The comment field is not available in the web server.
; ALIGN AC069454/8219
Name and comment for the
first sequence. The comment field is not available in the web server.
; ALIGN Score: 561
The score of the best
foldalignment.
; ALIGN Identity: 39 % ( 28 / 72)
The sequence identity of the alignment. There are 28 identical nucleotides out
of 72.
; ALIGN Begin
; ALIGN
; ALIGN V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
; ALIGN Structure (((((((..( (((....... .)))).(((( (.......))
; ALIGN AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
; ALIGN
; ALIGN V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
; ALIGN Structure ))).....(( (((....... )))))))))) ))
; ALIGN AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
; ALIGN
; ALIGN End
This shows the alignment and structure of the best local alignment between the
two sequences.
; ==============================================================================
A separation line.
- A sequence section
The sequence section has two parts. The information part and the sequence part.
A typical sequence section:
; TYPE RNA
; COL 1 label
; COL 2 residue
; COL 3 seqpos
; COL 4 alignpos
; COL 5 align_bp
; COL 6 seqpos_bp
; ENTRY V00158/694-623
; ALIGNMENT_ID Structure 1
; ALIGNMENT_LIST V00158/694-623 AC069454/82192-82263
; FOLDALIGN_SCORE 561
; GROUP 1
; FILENAME data.fasta
; START_POSITION 1
; END_POSITION 71
; ALIGNMENT_SIZE 2
; ALIGNMENT_LENGTH 72
; SEQUENCE_LENGTH 72
; PARAMETER max_length=71
; PARAMETER max_diff=15
; PARAMETER min_loop=3
; PARAMETER score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
; PARAMETER nobranching=<false>
; PARAMETER global=<false>
; ------------------------------------------------------------------------------
N G 1 1 72 71
N C 2 2 71 70
N A 3 3 70 69
.
.
G - . 19 . .
.
.
N U 69 70 3 3
N G 70 71 2 2
N C 71 72 1 1
; ******************************************************************************
; TYPE RNA
Different types of data can be stored in col format. The type field indicates
what type this is.
; COL 1 label
; COL 2 residue
; COL 3 seqpos
; COL 4 alignpos
; COL 5 align_bp
; COL 6 seqpos_bp
These fields are the headers of the columns in the sequence part of this
section. labal has two values. N for nucleotide or G for
gap. residue is a nucleotide or a gap. seqpos is the position in
the orginal sequence. alignpos is the postion in the foldalignment.
align_bp indicates which position in the foldalignment this position is
base-paired with. "." indicates no base-pairing. seqpos_bp is the
base-pair position in the orginal sequence coordinates.
; ENTRY V00158/694-623
The name of the sequence.
; ALIGNMENT_ID Structure 1
A comment field. Set either by the user or the web server.
; ALIGNMENT_LIST V00158/694-623 AC069454/82192-82263
The sequences in this alignment.
; FOLDALIGN_SCORE 561
The score of this alignment.
; GROUP 1
Currently not used.
; FILENAME data.fasta
Name of the sequence input file.
; START_POSITION 1
The start position of the foldalignment.
; END_POSITION 71
The end position of the foldalignment.
; ALIGNMENT_SIZE 2
Currently always two.
; ALIGNMENT_LENGTH 72
The length of the foldalignment.
; SEQUENCE_LENGTH 72
The length of the input sequence.
; PARAMETER max_length=71
The lambda value.
; PARAMETER max_diff=15
The delta value.
; PARAMETER min_loop=3
The minimum number of nucleotides between two nucleotides base-paired to each
other.
; PARAMETER score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
The score matrix used.
; PARAMETER nobranching=<false>
If false branching structures are allowed. If true only stem loops are allowed.
Default value false. Always false for web server runs.
; PARAMETER global=<false>
The foldalignment is global if this parameter is true. Default value is
false
; ------------------------------------------------------------------------------
This separation line separates the information and the sequence parts of
the sequence section.
N G 1 1 72 71
N C 2 2 71 70
N A 3 3 70 69
.
.
G - . 19 . .
.
.
N U 69 70 3 3
N G 70 71 2 2
N C 71 72 1 1
Each row is a position in the alignment. The columns are explained at the
; COL lines.
; ******************************************************************************
This indicates the end of the sequence section. This line can occur multiple
times in the normal output. A FOLDALIGN output file should end with one of these
lines.
- Local alignment scores. The -plot_score option
FOLDALIGN can be used to scan for more than the best scoring RNA structures.
When the -plot_score option is used, then FOLDALIGN prints coordinates and
the score of the best local alignment starting at any pair of positions along
the two sequences. The extra output produced has two parts. A header and a
series of local score (LS) lines.
Many of the local alignment header fields have been explained above.
A typical local alignment score header look like this:
; FOLDALIGN 2.0.1
; REFERENCE JH. Havgaard, R. Lyngsų, GD. Stormo, J. Gorodkin
; REFERENCE Pairwise local structural alignment of RNA sequences
; REFERENCE with sequence similarity less than 40%
; REFERENCE In press Bioinformatics 2005
; ALIGNMENT_ID Structure 1
; ALIGNING V00158/694-623 against AC069454/82192-82263
; SEQUENCE_1_COMMENT
; SEQUENCE_2_COMMENT
; LENGTH_SEQUENCE_1 72
; LENGTH_SEQUENCE_2 72
; FILENAME data.fasta
; PARAMETER max_length=72
; PARAMETER max_diff=15
; PARAMETER min_loop=3
; PARAMETER score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
; PARAMETER nobranching=<false>
; PARAMETER global=<false>
; TYPE Foldalign_local_scores
; COL 1 label
; COL 2 Alignment_start_position_sequence_1
; COL 3 Alignment_end_position_sequence_1
; COL 4 Alignment_start_position_sequence_2
; COL 5 Alignment_end_position_sequence_2
; COL 6 Alignment_score
; ------------------------------------------------------------------------------
.
.
.
LS 1 71 3 69 362
LS 1 70 2 70 408
LS 1 71 1 71 561
; SEQUENCE_1_COMMENT
; SEQUENCE_2_COMMENT
This is the comment of the sequences. This field is empty when the web
server is used.
LS 1 71 3 69 362
These are the local score lines. There is one for each pair of positions along
the two sequences.
In the Z-score plot this would position (1, 3). The score
362 is recalculated to the Z-score.
- Score matrix file format
A FOLDALIGN score matrix contains several elements. The elements are separated
by empty lines. The elements can with one exception be placed in any order. The
exception is the Alfabet: element which must be the first element in a
score matrix file, if it is present. A score matrix do not have to have all
elements present. Anything missing will be given a default value. See below for
examples. A score matrix holding the default values is distributed with the
FOLDALIGN package. The current default energy
parameters are taken from mfold.
The default substitution matrices are a variation of the
Ribosum
matrices,
but produced in a fashion more similar to the
BLOSUM matrices.
Comment and empty lines can be placed between the elements. Comment lines starts
with a #.
The score matrix elements are:
Alfabet: This is the alphabet of the sequences. The first character is
also the gap/unknown character. The last field on the line is the size of the
alphabet. There is no difference between upper and lower case letters. T's are
read as U's.
Alfabet:
- A C G U 5
Stacking: This is the cost for stacking one base-pair on to another in a
stem. Stackings which promotes stems structures, are positive.
Stacking:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A G
0 0 0 9 0 0 21 0 0 24 0 13 13 0 10 0 A U
.
.
Hairpin Close: This is the cost for stacking the last unpaired pair of
nucleotides in a hairpin loop on to the first base-pair of the closing
stem.
Hairpin Close:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 3 0 0 15 0 0 11 0 -2 5 0 5 0 A A
0 0 0 5 0 0 15 0 0 15 0 5 3 0 3 0 A C
0 0 0 3 0 0 14 0 0 13 0 3 6 0 6 0 A G
0 0 0 3 0 0 18 0 0 21 0 3 5 0 5 0 A U
.
.
Internal loop: This is the cost of stacking the first / last unpaired
pair of nucleotides in an internal loop on to the last / first base-pair of the
surround stems.
Internal loop:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 -7 0 0 0 0 0 0 0 -7 -7 0 -7 0 A A
0 0 0 -7 0 0 0 0 0 0 0 -7 -7 0 -7 0 A C
0 0 0 4 0 0 11 0 0 11 0 4 4 0 4 0 A G
0 0 0 -7 0 0 0 0 0 0 0 -7 -7 0 -7 0 A U
.
.
5' Dangle: The cost of adding a 5' dangle nucleotide to a stem. Only
used in multibranched loops.
5' Dangle:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 3 0 0 2 0 0 5 0 3 3 0 3 0 A
0 0 0 1 0 0 3 0 0 3 0 1 3 0 3 0 C
0 0 0 2 0 0 0 0 0 2 0 2 4 0 4 0 G
0 0 0 2 0 0 0 0 0 1 0 2 2 0 2 0 U
3' Dangle: The cost of adding a 3' dangle nucleotide to a stem. Used in
multibranched loops.
3' Dangle:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 8 0 0 17 0 0 11 0 8 7 0 7 0 A
0 0 0 5 0 0 8 0 0 4 0 5 1 0 1 0 C
0 0 0 8 0 0 17 0 0 13 0 8 7 0 7 0 G
0 0 0 6 0 0 12 0 0 6 0 6 1 0 1 0 U
Loop length costs: This table holds the length depended cost for hairpin,
bulge, and internal loops. The head line must be followed by a line telling how
many lines should be read.
Loop length costs:
30 lines
# Size Hairpin Bulge Internal
1 -57 -38 -17
2 -57 -28 -17
3 -57 -32 -17
4 -56 -36 -17
.
.
Miscellaneous: This element holds a list of parameters.
- Gap_open. This is the gap opening cost.
- Elongation_bonus. This is the cost of continuing a gap.
- Multibranchloop. This is the cost of closing a multibranched loop
with a base-pair.
- Multibranchloop_helix. The cost of adding an extra stem to a
multibranched loop.
- Multibranchloop_nucleotide. The cost of adding an extra unpaired
nucleotide in a multibranched loop.
- Multibranchloop_non_GC_stem_end. An extra cost added when a stem ends
with a base-pair which is not GC. The values in the Hairpin close and
Internal loop matrices are assumed to have already been corrected. This
value is therefore only used in multibranched loops and bulges with a length
longer than one nucleotide.
- Asymmetric_cost. The cost of asymmetric internal loops.
- Asymmetric_cost_limit. The maximum assymmetric cost.
- Long_hairpin_loop_factor. Used to estimate hairpin Loop length
costs not included in the table. Only used for lengths above 30.
- Long_bulge_loop_factor.Used to estimate bulge Loop length
costs not included in the table. Only used for lengths above 30.
- Long_Internal_loop_factor.Used to estimate internal loop Loop
length costs not included in the table. Only used for lengths above 30.
Base-pair: This matrix indicates which nucleotides base-pair.
- A C G U
0 0 0 0 0 -
0 0 0 0 1 A
0 0 0 1 0 C
0 0 1 0 1 G
0 1 0 1 0 U
Base-pair substitution: The cost of substituting a base-pair in one
sequence with a base-pair in the other sequence.
Base-pair substitution:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A G
0 0 0 11 0 0 3 0 0 5 0 0 2 0 -3 0 A U
.
.
Single strand substitution: The cost of substituting unpaired
nucleotides.
Single strand substitution:
A C G U
19 -22 -18 -19 A
-22 11 -25 -15 C
-18 -25 9 -20 G
-19 -15 -20 13 U
- Score matrix examples
A score matrix which only changes the gap penalties, would look like this:
# Changing the gap penalties.
Miscellaneous:
Gap_open: -80
Elongation_bonus: -40
A score matrix which have no single strand substitution cost, would look like
this:
# No single strand substitution cost.
Single strand substitution:
A C G U
0 0 0 0 A
0 0 0 0 C
0 0 0 0 G
0 0 0 0 U
A score matrix combining several elements would look like this (no sequence
similarity cost):
# Scorematrix format
# X-axis: i & j. Y-axis k & l
Base-pair substitution:
A A A A C C C C G G G G U U U U
A C G U A C G U A C G U A C G U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A G
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C G
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G G
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G U
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U G
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U
# Initmatrix format
# X-axis i. Y-axis j
Single strand substitution:
A C G U
0 0 0 0 A
0 0 0 0 C
0 0 0 0 G
0 0 0 0 U