logo
logo


  Home

  Introduction

  Server

  Software

  Documentation

  Other servers

Contents


    We are currently upgrading the webserver from Foldalign version 2.0.3 to the new 2.1.0, the documentation is therefore not completely upto date.

    Web-server specific

  1. Data format Information about maximum sequence lengths, and nucleotide types.
  2. Parameters Description of the web server parameters
  3. Web-server output examples
    1. Scan
    2. Local
    3. Global

    Web-server and Command-line

  4. Z-score plot A describtion of the Z-score plot.
  5. Z-score Definition of the Z-score.
  6. P-value Description of the P-value and its parameters.
  7. NS score Description of the No single strand Substitution score.
  8. FOLDALIGN output file format Description of the FOLDALIGN output file format.
    1. Header section
    2. Sequence section
    3. Local alignment scores
  9. Score matrix format file Explains about the FOLDALIGN score matrix file format.
    1. Score matrix Examples


  1. Data format
  2. Data should be in fasta format. If more than two sequences are submitted, only the first two sequences will be processed.

    The only nucleotides allowed are A, C, G, T, U, N, -.
    The N and - are treated as gaps. In the output T is printed as U.

    Fasta format for scanning

    Search for multiple local hits. Max sequence length 500 and Max motif length (lambda) 200.
    >AE000516/4118797-4118870
    GGCUCGAUCCGGCCGGGCCGGCUGCGGCGCUGGUGGAAUUCGAGCGCUCCUGUCGAUGGA
    CGUGACGGUCGGACCUGCGGUUUGGCUAGUCAACGGUCCGGUGCGAUAGGCUGUCGUGGC
    UUCAAGCGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGG
    UUCAAAUCCCGCCACCCCGACCGAGAGAUCGCUGACGACAGCCUUACCCGGCGCAGCGUG
    GUAGCUUGCUGCAGUCUGCUCGGGCGGCAGCGCCACCCUGACGGUGCUGGUUGACCAUGC
    CGGACAGCACGUCAACGCACAGGCAUUUCCAACGGAAGUUGUAGGUUACCGGCCGCCCUA
    AAACACGGUGCACUUUUCGUUAAAGGUUGUGGGUGUGGAUCCAACGAAAUUCGUUGCCCC
    GGCGUGGGCAGCGCCGUGUCCACAGGGGGACCCGCCGCGCAUUACGCCUAUGGGCCCACC
    CCCGUACCGCGGGAGUUGGC
    >AE000051/8942-9018
    UAAAUUUAAGUAUUAAGUAUUAAUUAAACGGGAAAGGAAUAGGACGAUUUACCAGGGGUU
    AAACCUAAUCGGAACCGACCGGUACUAAAGUAGUCUUUUUGUGAAUUCCUUCUUUUAUAG
    CUAUAACUGCUGCCAUUGCCGUUGGCAUUAUUAGCGUUAGCAGCUUCGCUGAUGUCUAAC
    GCGGUAUUAAUUAACGUACUAAAGGCGUUUACAAUCAUAAUUACUCCCGCACUGAAGCUU
    CAAAAUGAUAAGCCUCCAACUACUUGUUUAGCUUCUGAUUUUUUUAAUCUCUGCAUUAUU
    GUGUCUCCAUUUAAAUGAAUUACAUAACUAAAGUAAAAUGCCAAAGUUUUAUAAUUAACU
    UAACUGUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGU
    UCGAUUCCUGUUGGUGACGCCAUAAUACUUUCUAACCUACCAGUGUUACCCUGGUAGGUU
    UUUUAUUUGCUCCGUUGGCU
    

    Fasta format for local and global alignment

    Local or global alignment of the sequences. Max sequence length is 200. The motif can be as long as the sequences. For global alignment the delta parameter has to be larger than the length difference between the sequences
    >AE000516/4118797-4118870
    CGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGGUUCAAA
    UCCCGCCACCCCGA
    >AE000051/8942-9018
    GUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGUUCGAU
    UCCUGUUGGUGACGCCA
    


  3. Parameters
  4. Type of comparison. Choices: Scan, Local, or Global
    These are the three different server types.
    Scan makes a local foldalignment between the two input sequences and reports a ranked list of local foldalignment hits. The input sequences can be long, up to 500 nucleotides, but the length of the foldalignments is limited to Maximum motif length lambda nucleotides. A scan can take several hours to complete depending on the length of the sequences and the parameters used. An example of the output can be seen here.
    Local makes a local foldalignment. It can be as long as the input sequences. Only one foldalignment is reported. The input sequences are limited to 200 nucleotides. For short sequences a local foldalignment takes seconds or minutes. For long sequences it can take up to an hour. An example of the output can be seen here.
    Global makes a global foldalignment. The sequences are foldaligned from end to end. The length of the input sequences is limited to 200 nucleotides. The maximum length difference delta parameter must be larger than or equal to the length difference between the sequences. For short sequences a global foldalignment takes seconds or minutes. For long sequences it can take up to an hour. An example of the output can be seen here.

    Email
    If an email address is specified, an email will be sent to the address when the alignment is done. The email will contain a link to the results page. The e-mail address will not be used for anything else.

    Comment/ID
    This field can be used for marking and/or tracking different submissions.

    Maximum length difference (delta). Ranges: Scan 1-15. Local and global 1-25.
    A unconstrained version of FOLDALIGN takes a very long time to run. It also demands a lot of memory. To lower the time and memory requirements of the algorithm delta limits the length difference between subsequences being compared. This for example means that the alignment of a ten nucleotide subsequence and a 60 nucleotide subsequence will not be considered. For structures that are similar this is not a problem, but if there are long inserts in one of the sequences, then the correct structure might not be found.

    Gap opening cost. Range: -1000 -> 0
    This is the cost of opening a new gap. We have found that a gap opening of around -50 gives good results when scanning. For local and global foldalignments the gap opening cost is dependend on the RNA type. If it is not known what kind of RNA the sequences contain, then try several values. The range we have found useful, is -10 -> -100.

    Gap elongation cost. Range: -1000 -> 0
    This is the cost of elongating an already started gap. We usually fix this value at half the gap opening cost.

    Maximum motif length (lambda). Range: 1 - 200. Only effects scan type comparisons.
    This is the maximum length of a foldalignment not counting gaps. The use of this parameter makes it in theory possible to scan sequences of any biologically relevant length on an ordinary desktop computer. But to limit the resources needed by the server the sequence length has been limited. The time needed to run FOLDALIGN is greatly affected by the value of lambda. Unfortunately lambda and Lambda are two completely different things. They should not be mistaken for each other.

    Maximum number of structures. Range: 1 - 10. Only effects scan type comparisons.
    This is the maximum number of structures reported by the server. For each server FOLDALIGN has to realign and backtrack the foldalignment. The position of these alignments are drawn as bars on the Z-score plot.


  5. Web-server output examples
  6. Examples of the output from the three different types of comparisons. The sequences used are those from the Data format section. The parameters used are the default parameters. The archive file available for download on the result page contains among other files a file named index.html which is a copy of the result page.
    1. Scan
    2. Local
    3. Global

  7. Z-score plot
  8. A Z-score plot shows the score of the best scoring alignment starting at position (i,k). The Z-score is calculated from the local alignment scores produced during the alignment. True alignments will often show up as big blotches on the plot. Along the sides of the plot bars indicating the best alignments can be seen. Bars are drawn for the alignment hits for which the structure predictions are also printed. This is controlled by the Maximum number of structures (nh) parameter. The bar for the highest scoring alignment is colored dark blue. Any other alignment bars has a light blue color. Since the alignments can overlap on one of the sequences, start (yellow color) and end (red color) positions have been indicated. The coordinates are listed in the list of hits below the figure in the web server output.


  9. Z-score
  10. The Z-score is defined as:

    Z = (FA - mFA)/sFA

    FA is the FOLDALIGN score.
    mFA is the mean FOLDALIGN score.
    sFA is the spread of the FOLDALIGN score.
    The scores are found in the last column of the LS lines of the output files. When running FOLDALIGN on the command-line, option -plot_score has to be used to get the LS lines.


  11. P-value
  12. The P-value is an estimate of the probability that a foldalignment with a given score would be found chance.
    The P-value depends on the Lambda and K which are the parameters of the extreme value distribution. Please note that this Lambda is not the same as the FOLDALIGN lambda. Lambda and K are reported on the web page. To estimate Lambda and K the sum of the scores (Island sum) of all non-overlapping hits with a score above a threshold C is used. Island count shows how many scores have been used to estimate the value of Lambda and K.

    Do not trust the P-value to much. Non-random foldalignments will bias the estimate. The number of scores used in the estimates is most often very low. Border effects have also not been taken into account.


  13. NS score
  14. The No single strand Substitution score is the standard FOLDALIGN score minus the sequence similarity score for the single stranded regions of the structure.

  15. FOLDALIGN output file format
  16. Contents

    1. Header section
    2. Sequence section
    3. Local alignment scores

    FOLDALIGN produces output in col format. The format aims at keeping information about data, parameters, and results in one file. At the same time the data has to be in a format which is easy to work with.

    The default FOLDALIGN output has three sections. The first part, the header, holds some general information and the structure of the best foldalignment found. The second section holds general information and information about the first sequence. The third section is similar to the second, but contains information about the second sequence.

    1. The header section
    2. A typical standard header looks like this:
      ; FOLDALIGN           2.0.1
      ; REFERENCE           JH. Havgaard, R. Lyngsų, GD. Stormo, J. Gorodkin
      ; REFERENCE           Pairwise local structural alignment of RNA sequences
      ; REFERENCE           with sequence similarity less than 40%
      ; REFERENCE           In press Bioinformatics 2005
      ; ALIGNMENT_ID         Structure 1
      ; ALIGNING            V00158/694-623 against AC069454/82192-82263
      ; ALIGN               V00158/694-62
      ; ALIGN               AC069454/8219
      ; ALIGN               Score: 561
      ; ALIGN               Identity: 39 % ( 28 / 72 )
      ; ALIGN               Begin
      ; ALIGN
      ; ALIGN               V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
      ; ALIGN               Structure     (((((((..( (((....... .)))).(((( (.......))
      ; ALIGN               AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
      ; ALIGN
      ; ALIGN               V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
      ; ALIGN               Structure     ))).....(( (((....... )))))))))) ))
      ; ALIGN               AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
      ; ALIGN
      ; ALIGN               End
      ; ==============================================================================
      

      ; FOLDALIGN           2.0.1
      This field indicates which version of FOLDALIGN was used to produce the file.
      ; REFERENCE
      These fields contain citation information.
      ; ALIGNMENT_ID
      Holds a comment either set by the user or the web server. The default value is n.a.
      ; ALIGNING            V00158/694-623 against AC069454/82192-82263
      The name of sequence one against name of sequence two
      ; ALIGN               V00158/694-62
      Name and comment for the first sequence. The comment field is not available in the web server.
      ; ALIGN               AC069454/8219
      Name and comment for the first sequence. The comment field is not available in the web server.
      ; ALIGN               Score: 561
      The score of the best foldalignment.
      ; ALIGN               Identity: 39 % ( 28 / 72)
      The sequence identity of the alignment. There are 28 identical nucleotides out of 72.
      ; ALIGN               Begin
      ; ALIGN
      ; ALIGN               V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
      ; ALIGN               Structure     (((((((..( (((....... .)))).(((( (.......))
      ; ALIGN               AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
      ; ALIGN
      ; ALIGN               V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
      ; ALIGN               Structure     ))).....(( (((....... )))))))))) ))
      ; ALIGN               AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
      ; ALIGN
      ; ALIGN               End
      
      This shows the alignment and structure of the best local alignment between the two sequences.
      ; ==============================================================================
      A separation line.

    3. A sequence section
    4. The sequence section has two parts. The information part and the sequence part. A typical sequence section:

      ; TYPE                RNA
      ; COL 1               label
      ; COL 2               residue
      ; COL 3               seqpos
      ; COL 4               alignpos
      ; COL 5               align_bp
      ; COL 6               seqpos_bp
      ; ENTRY               V00158/694-623
      ; ALIGNMENT_ID         Structure 1
      ; ALIGNMENT_LIST      V00158/694-623 AC069454/82192-82263
      ; FOLDALIGN_SCORE     561
      ; GROUP               1
      ; FILENAME            data.fasta
      ; START_POSITION      1
      ; END_POSITION        71
      ; ALIGNMENT_SIZE      2
      ; ALIGNMENT_LENGTH    72
      ; SEQUENCE_LENGTH     72
      ; PARAMETER           max_length=71
      ; PARAMETER           max_diff=15
      ; PARAMETER           min_loop=3
      ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
      ; PARAMETER           nobranching=<false>
      ; PARAMETER           global=<false>
      ; ------------------------------------------------------------------------------
      N         G          1         1        72        71
      N         C          2         2        71        70
      N         A          3         3        70        69
      .
      .
      G         -          .        19         .         .
      .
      .
      N         U         69        70         3         3
      N         G         70        71         2         2
      N         C         71        72         1         1
      ; ******************************************************************************
      

      ; TYPE                RNA
      Different types of data can be stored in col format. The type field indicates what type this is.
      ; COL 1               label
      ; COL 2               residue
      ; COL 3               seqpos
      ; COL 4               alignpos
      ; COL 5               align_bp
      ; COL 6               seqpos_bp
      
      These fields are the headers of the columns in the sequence part of this section. labal has two values. N for nucleotide or G for gap. residue is a nucleotide or a gap. seqpos is the position in the orginal sequence. alignpos is the postion in the foldalignment. align_bp indicates which position in the foldalignment this position is base-paired with. "." indicates no base-pairing. seqpos_bp is the base-pair position in the orginal sequence coordinates.
      ; ENTRY               V00158/694-623
      The name of the sequence.
      ; ALIGNMENT_ID        Structure 1
      A comment field. Set either by the user or the web server.
      ; ALIGNMENT_LIST      V00158/694-623 AC069454/82192-82263
      The sequences in this alignment.
      ; FOLDALIGN_SCORE     561
      The score of this alignment.
      ; GROUP               1
      Currently not used.
      ; FILENAME            data.fasta
      Name of the sequence input file.
      ; START_POSITION      1
      The start position of the foldalignment.
      ; END_POSITION        71
      The end position of the foldalignment.
      ; ALIGNMENT_SIZE      2
      Currently always two.
      ; ALIGNMENT_LENGTH    72
      The length of the foldalignment.
      ; SEQUENCE_LENGTH     72
      The length of the input sequence.
      ; PARAMETER           max_length=71
      The lambda value.
      ; PARAMETER           max_diff=15
      The delta value.
      ; PARAMETER           min_loop=3
      The minimum number of nucleotides between two nucleotides base-paired to each other.
      ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
      The score matrix used.
      ; PARAMETER           nobranching=<false>
      If false branching structures are allowed. If true only stem loops are allowed. Default value false. Always false for web server runs.
      ; PARAMETER           global=<false>
      The foldalignment is global if this parameter is true. Default value is false
      ; ------------------------------------------------------------------------------
      This separation line separates the information and the sequence parts of the sequence section.
      N         G          1         1        72        71
      N         C          2         2        71        70
      N         A          3         3        70        69
      .
      .
      G         -          .        19         .         .
      .
      .
      N         U         69        70         3         3
      N         G         70        71         2         2
      N         C         71        72         1         1
      
      Each row is a position in the alignment. The columns are explained at the ; COL lines.
      ; ******************************************************************************
      This indicates the end of the sequence section. This line can occur multiple times in the normal output. A FOLDALIGN output file should end with one of these lines.

    5. Local alignment scores. The -plot_score option
    6. FOLDALIGN can be used to scan for more than the best scoring RNA structures. When the -plot_score option is used, then FOLDALIGN prints coordinates and the score of the best local alignment starting at any pair of positions along the two sequences. The extra output produced has two parts. A header and a series of local score (LS) lines.

      Many of the local alignment header fields have been explained above. A typical local alignment score header look like this:


      ; FOLDALIGN           2.0.1
      ; REFERENCE           JH. Havgaard, R. Lyngsų, GD. Stormo, J. Gorodkin
      ; REFERENCE           Pairwise local structural alignment of RNA sequences
      ; REFERENCE           with sequence similarity less than 40%
      ; REFERENCE           In press Bioinformatics 2005
      ; ALIGNMENT_ID        Structure 1
      ; ALIGNING            V00158/694-623 against AC069454/82192-82263
      ; SEQUENCE_1_COMMENT
      ; SEQUENCE_2_COMMENT
      ; LENGTH_SEQUENCE_1   72
      ; LENGTH_SEQUENCE_2   72
      ; FILENAME            data.fasta
      ; PARAMETER           max_length=72
      ; PARAMETER           max_diff=15
      ; PARAMETER           min_loop=3
      ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
      ; PARAMETER           nobranching=<false>
      ; PARAMETER           global=<false>
      ; TYPE                Foldalign_local_scores
      ; COL 1               label
      ; COL 2               Alignment_start_position_sequence_1
      ; COL 3               Alignment_end_position_sequence_1
      ; COL 4               Alignment_start_position_sequence_2
      ; COL 5               Alignment_end_position_sequence_2
      ; COL 6               Alignment_score
      ; ------------------------------------------------------------------------------
      .
      .
      .
      LS 1 71 3 69 362
      LS 1 70 2 70 408
      LS 1 71 1 71 561
      

      ; SEQUENCE_1_COMMENT
      ; SEQUENCE_2_COMMENT
      This is the comment of the sequences. This field is empty when the web server is used.
      LS 1 71 3 69 362
      These are the local score lines. There is one for each pair of positions along the two sequences. In the Z-score plot this would position (1, 3). The score 362 is recalculated to the Z-score.

  17. Score matrix file format
  18. A FOLDALIGN score matrix contains several elements. The elements are separated by empty lines. The elements can with one exception be placed in any order. The exception is the Alfabet: element which must be the first element in a score matrix file, if it is present. A score matrix do not have to have all elements present. Anything missing will be given a default value. See below for examples. A score matrix holding the default values is distributed with the FOLDALIGN package. The current default energy parameters are taken from mfold. The default substitution matrices are a variation of the Ribosum matrices, but produced in a fashion more similar to the BLOSUM matrices.

    Comment and empty lines can be placed between the elements. Comment lines starts with a #.

    The score matrix elements are:

    Alfabet: This is the alphabet of the sequences. The first character is also the gap/unknown character. The last field on the line is the size of the alphabet. There is no difference between upper and lower case letters. T's are read as U's.

    Alfabet:
    - A C G U 5
    

    Stacking: This is the cost for stacking one base-pair on to another in a stem. Stackings which promotes stems structures, are positive.

    Stacking:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
    0    0    0    9    0    0   21    0    0   24    0   13   13    0   10    0 A U
    .
    .
    

    Hairpin Close: This is the cost for stacking the last unpaired pair of nucleotides in a hairpin loop on to the first base-pair of the closing stem.

    Hairpin Close:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
    0    0    0    3    0    0   15    0    0   11    0   -2    5    0    5    0 A A
    0    0    0    5    0    0   15    0    0   15    0    5    3    0    3    0 A C
    0    0    0    3    0    0   14    0    0   13    0    3    6    0    6    0 A G
    0    0    0    3    0    0   18    0    0   21    0    3    5    0    5    0 A U
    .
    .
    

    Internal loop: This is the cost of stacking the first / last unpaired pair of nucleotides in an internal loop on to the last / first base-pair of the surround stems.

    Internal loop:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
    0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A A
    0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A C
    0    0    0    4    0    0   11    0    0   11    0    4    4    0    4    0 A G
    0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A U
    .
    .
    

    5' Dangle: The cost of adding a 5' dangle nucleotide to a stem. Only used in multibranched loops.

    5' Dangle:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U 
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U 
    0    0    0    3    0    0    2    0    0    5    0    3    3    0    3    0 A
    0    0    0    1    0    0    3    0    0    3    0    1    3    0    3    0 C
    0    0    0    2    0    0    0    0    0    2    0    2    4    0    4    0 G
    0    0    0    2    0    0    0    0    0    1    0    2    2    0    2    0 U
    

    3' Dangle: The cost of adding a 3' dangle nucleotide to a stem. Used in multibranched loops.

    3' Dangle:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U 
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U 
    0    0    0    8    0    0   17    0    0   11    0    8    7    0    7    0 A
    0    0    0    5    0    0    8    0    0    4    0    5    1    0    1    0 C
    0    0    0    8    0    0   17    0    0   13    0    8    7    0    7    0 G
    0    0    0    6    0    0   12    0    0    6    0    6    1    0    1    0 U
    

    Loop length costs: This table holds the length depended cost for hairpin, bulge, and internal loops. The head line must be followed by a line telling how many lines should be read.

    Loop length costs:
    30 lines
    # Size Hairpin Bulge Internal
      1     -57     -38    -17
      2     -57     -28    -17
      3     -57     -32    -17
      4     -56     -36    -17
    .
    .
    

    Miscellaneous: This element holds a list of parameters.

    • Gap_open. This is the gap opening cost.
    • Elongation_bonus. This is the cost of continuing a gap.
    • Multibranchloop. This is the cost of closing a multibranched loop with a base-pair.
    • Multibranchloop_helix. The cost of adding an extra stem to a multibranched loop.
    • Multibranchloop_nucleotide. The cost of adding an extra unpaired nucleotide in a multibranched loop.
    • Multibranchloop_non_GC_stem_end. An extra cost added when a stem ends with a base-pair which is not GC. The values in the Hairpin close and Internal loop matrices are assumed to have already been corrected. This value is therefore only used in multibranched loops and bulges with a length longer than one nucleotide.
    • Asymmetric_cost. The cost of asymmetric internal loops.
    • Asymmetric_cost_limit. The maximum assymmetric cost.
    • Long_hairpin_loop_factor. Used to estimate hairpin Loop length costs not included in the table. Only used for lengths above 30.
    • Long_bulge_loop_factor.Used to estimate bulge Loop length costs not included in the table. Only used for lengths above 30.
    • Long_Internal_loop_factor.Used to estimate internal loop Loop length costs not included in the table. Only used for lengths above 30.

    Base-pair: This matrix indicates which nucleotides base-pair.

    - A C G U 
    0 0 0 0 0 -
    0 0 0 0 1 A
    0 0 0 1 0 C
    0 0 1 0 1 G
    0 1 0 1 0 U
    

    Base-pair substitution: The cost of substituting a base-pair in one sequence with a base-pair in the other sequence.

    Base-pair substitution:
    A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U
    A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
    0    0    0   11    0    0    3    0    0    5    0    0    2    0   -3    0 A U
    .
    .
    

    Single strand substitution: The cost of substituting unpaired nucleotides.

    Single strand substitution:
        A     C     G     U 
       19   -22   -18   -19 A
      -22    11   -25   -15 C
      -18   -25     9   -20 G
      -19   -15   -20    13 U
    

    1. Score matrix examples
    2. A score matrix which only changes the gap penalties, would look like this:
      # Changing the gap penalties.
      
      Miscellaneous:
      Gap_open:                        -80
      Elongation_bonus:                -40
      

      A score matrix which have no single strand substitution cost, would look like this:

      # No single strand substitution cost.
      
      Single strand substitution:
          A     C     G     U 
          0     0     0     0 A
          0     0     0     0 C
          0     0     0     0 G
          0     0     0     0 U
      

      A score matrix combining several elements would look like this (no sequence similarity cost):

      # Scorematrix format
      # X-axis: i & j. Y-axis k & l
      Base-pair substitution:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A U
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C G
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C U
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G G
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G U
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U G
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U U
      
      # Initmatrix format
      # X-axis i. Y-axis j
      Single strand substitution:
          A     C     G     U
          0     0     0     0 A
          0     0     0     0 C
          0     0     0     0 G
          0     0     0     0 U
      



Comments, questions, etc., email webmaster@foldalign.kvl.dk.

Last updated September 27th, 2005 by Jakob Hull Havgaard