Getting Started

The ppalign package constists of following components:

ppalign
ppblast
The ppalign library

ppalign

ppalign can be used to analyse the posterior an user supplied alignment. Alternatively the user may provide a pair of sequences (option --optimize [algo]). In this case ppalign firstly determines the optimal alignment and then the positionwise posterior probabilities for this alignment.

Three algorithms are available:

Global alignment (option --algo global)
Global alignment only on the aligned part of the alignment. In this case the padding gaps at the begin and the end of the alignment are ignored (option --algo global_bound). This option can be used together for example with the option --optimize local
Probabilities for start and end points of local alignments (option --algo local_start). In a second step, the user may then realign the sequences on a range which has been choosen according to this distributions (options --restrict_end [i2 j2] and --restrict_start [i1 j1]).

Example useage

Use ppALIGN to compute the posterior probabilities for a given alignment. It can be used interactively, for instance use the following command to compute the posterior probabilities for protein alignment using the blosum62 matrix.

ppalign -s blosum62 -a aa -f text

Then paste or type the alignment in fasta format. When the second sequence has been finished just press ctrl+D to indicate the end of the alignment. Alternatively you may provide an alignment from a file via -i filename.fasta

>Query Sequence 
G-YATTIIPRIYTYYVSTALFAIFGIRML----REGLKMSPDEGQEELEEVQAEIKKKDEELQRSKLANGAADVEAG
>Subject Sequence 
GRIVPNLISRKHTNSAATVLYAFFGLRLLYIAWRSDSKVSQKKEMEEVEE----------------------KLESG
>

If you do not know the alignment you may optimize it before the actual computation. Just use

ppalign -s blosum62 -a aa -f text --optimize global

and provide a pair of non aligned sequences. You will get the following result:

        PpAlign_Program: ppalign
         PpAlign_Version: 1.0
                Alphabet: protein
             Scorematrix: blosum62
                Gap_Open: 11
           Gap_Extension: 1
-------------
                QueryDef: Query Sequence 
              SubjectDef: Subject Sequence 
               Align-len: 77
                Identity: 20
                    Gaps: 27
                   Score: 38
           AvgPosterProb: 0.516278
          ---------------------------------------- 
          #     #######################            
          #    ########################            
          #    ########################            
          #   #########################            
          #   #########################            
          #  ##########################            
          #  ##########################            
          #############################            
          #############################            
          ######################################## 
       1  G-YATTIIPRIYTYYVSTALFAIFGIRML----REGLKMS 
       1  GRIVPNLISRKHTNSAATVLYAFFGLRLLYIAWRSDSKVS 

          ------------------------------------- 
                                          ##### 
                                          ##### 
                    #                     ##### 
                    ###                   ##### 
                    #######               ##### 
                    #########             ##### 
                    #############         ##### 
                    ###################   ##### 
                    ########################### 
          ##################################### 
      36  PDEGQEELEEVQAEIKKKDEELQRSKLANGAADVEAG 
      41  QKKEMEEVEE----------------------KLESG

In this output, the confidence (posterior probability) in the alignment is indicated by vertical bars of "#" symbols divided in 10% bins. ppAlign reports the typical alignment charcteristics like length, number of gaps, number of matches, the score, and, additionally the average posterior probability.

With the options -f xml you obtain a structured machine readable XML document.
To produce a human readable HTML page just supply -f html -o example.html. In this example, we also used the options --sampling 10 --marg_decode to determine alternative alignments.
You may also try our ppALIGN webserver to test some features of ppalign, including local alignment.

ppblast

Let us start with a simple example of protein sequence similarity search using nblast available on the NCBI web-server.

First we search a DNA sequence against a DNA database. In our example we have searched the human beta globin (gi|455025) against the mouse genom database.
If available, you may use the command-line version of blast blastp with the option -fmt 7 for XML output. Redirect the XML output into a file (e.g. nblast.xml).
If you have used the web-server, download the result in the XML format (save it for example as blast_result.xml). This option can be found on the BLAST web-server (result page, and then download -> XML).
Run ppblast on the BLAST output
```
ppblast -i nblast.xml -o ppblast.xml
```
You have created the extended BLAST output ppblast.xml with the posterior probabilities. Note: If this step failed and produced the error message
```
error in constructing pair hmm:
1-2 * nu < 0
choose larger gap costs!
     
```
you probably used the default gap costs (0 for open and 0 for extension), which is located deeply in the so called linear regime. We recomment to overwrite this value by the command line arguments --open and --ext, for example like
```
ppblast -i nblast.xml --open 2 --ext 1 -o ppblast.xml
```
To produce a human readable HTML page and use more options, you may also try
```
ppblast -i nblast.xml --open 2 --ext 1 -f html \
--sampling 10 --expected --marg_decode -o ppblast.html
     
```
In addition to the posterior probabilties we also have sampled alternative alignments from the posterior distribution (--sampling 10), computed the expected score (--expected) with respect to the pair HMM, and, we have obtained the maximal averaged marginalized posterior alignments (--marg_decode)

The ppalign library

To use the library we recomment to look at the examples in the source distribution and the API documentation. If you are using the GNU compiler collection, you may link your own programs against the ppALIGN library. as follows

g++ -L/path/to/libppalign -I/path/to/include/ppalign mysrc.cpp -lppalign

You may try one of the examples.

ppALIGN

posterior probabilities for
score based sequence alignments

Navigation

Contact

Getting Started

ppalign

ppblast

The ppalign library

ppalign

Example useage

ppblast

The ppalign library

ppALIGN

posterior probabilities forscore based sequence alignments

Navigation

Contact

Getting Started

ppalign

ppblast

The ppalign library

ppalign

Example useage

ppblast

The ppalign library

posterior probabilities for
score based sequence alignments