
                                     
۰  ۰  ۰ ۰                        
                                
           ۰ ۰ ۰ 
 ۰   ۰    ۰  ۰        
                                      ۰    
                               
                                   

by

Fayyaz ul Amir Afsar Minhas and Asa Ben-Hur
Department of Computer Science
Colorado State University
Fort Collins, Colorado, USA.


Description
===========
PAIRPred is a partner specific protein-protein interaction site predictor that can make accurate predictions of whether a pair of residues from two different proteins interact or not. It differs from most existing interaction site predictors in that it considers the information about the interaction partner of a protein in making its predictions whereas most other methods produce partner-independent predictions. It employs a Support Vector Machine (SVM) to generate interaction propensity scores for a pair of residues from sequence information alone or in conjunction with structure based features. PAIRPred offers state of the art prediction accuracy. 
For More details about PAIRPred or if you need help installing or obtaining predictions for your proteins, please read our paper or send an email to us.
Fayyaz Minhas: fayyazafsar <at> gmail_[dot]_com
Asa Ben-Hur: asa <at> cs [dot] colostate [dot] edu
Date: Feb 26, 2012
Version: 1.0
Installation and Usage
======================
DEPENDENCIES
------------
PAIRPred depends upon the following:
-Python 2.7
-Python modules: Numpy, Scipy, Matplotlib, Biopython, PyMol, PyML, mpi4py, OptionsParser, glob, copy, re, sys, os, cPickle, tempfile, time, random, pdb, itertools, pickle,__main__
-Tools: PSAIA, SPINEX, STRIDE, MSMS, PyMOL, PSI-BLAST
Please install these dependencies before using PAIRPred. Here is the simplest way of how to do this.
Python and Python dependencies: We would strongly encourage users to play with PAIRPred using Python Anaconda (https://store.continuum.io/) as it contains most of the Python dependencies and modules already. Otherwise you can use easy_install or pip to install them in your Python.
You will still need to install:
 PyML (http://pyml.sourceforge.net/) -- PyML runs only on Linux or Mac so PAIRPred is able to run only on Linux or Macs.
 PSAIA (http://complex.zesoi.fer.hr/PSAIA.html)
 SPINEX (http://sparks.informatics.iupui.edu/SPINE-X/)
 STRIDE (http://webclu.bio.wzw.tum.de/stride/)
 PSI-BLAST (as part of BLAST+, PSIBLAST2.2.27) (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
 PyMOL (http://www.pymol.org/) [optional, required only for visualization]
 MSMS (http://www.scripps.edu/~sanner/)
Once you have installed these you will need to add them to system path. Another way of doing the same thing with PAIRPred is to specify the installation folders in BISEPutils.py.

USAGE
=====
Testing
--------
With Structure Files:
^^^^^^^^^^^^^^^^^^^^
You must have the ligand or receptor PDB files which should be named as [CIDX]_l_u.pdb and [CIDX]_r_u.pdb where CIDX is a length 4 string name of the complex and 'l' is for the ligand and 'r' is for the receptor. 'u' indicates that the files correspond to the unbound structures. This format is the same as the one used in DBD 3.0 and 4.0.
The steps for testing for these two proteins are detailed below:
1. Feature Extraction:
	Rename protKernel_str.py to protKernel.py
	You can use the python script myPDB.py to extract the features. After running myPDB.py you will get two feature files [CIDX]_l_u.pdb.pkl and [CIDX]_r_u.pdb.pkl. You can copy these files into the directory containing the feature files used for training. The training feature files are available on PAIRPred website. Please read the help comments in myPDB.py for its usage.
2. Testing
	Rename protKernel_seq.py to protKernel.py
	You can use the python script testSingleComplex_par.py (after changing the kernel file variable to point to the structure kernel e.g. the provided 'dbdKernel_dbd34_str.dbk.pkl') to obtain a PAIRPred prediction file for the pair of proteins in [CIDX]. Please read the help comments for testSingleComplex_par.py for its usage. You can use the structure kernel file downloaded with this installation. Otherwise you can train a kernel on your own training data aswell. Please see the section on training for this purpose.
With Sequence Files:
^^^^^^^^^^^^^^^^^^^^
You must have the ligand or receptor FASTA files which should be named as [CIDX]_l_u.fasta and [CIDX]_r_u.fasta where CIDX is a length 4 string name of the complex and 'l' is for the ligand and 'r' is for the receptor. 'u' indicates that the files correspond to the unbound structures. This format is the same as the one used in DBD 3.0 and 4.0.
The steps for testing for these two proteins are detailed below:
1. Feature Extraction:
	You can use the python script myPDB.py to extract the features. After running myPDB.py you will get two feature files [CIDX]_l_u.pdb.pkl and [CIDX]_r_u.pdb.pkl. You can copy these files into the directory containing the feature files used for training. The training feature files are available on PAIRPred website. Please read the help comments in myPDB.py for its usage.
2. Testing
	You can use the python script testSingleComplex_par.py to obtain a PAIRPred prediction file for the pair of proteins in [CIDX]. Please read the help comments for testSingleComplex_par.py for its usage. You can use the sequence kernel file downloaded with this installation. Otherwise you can train a kernel on your own training data aswell. Please see the section on training for this purpose.
Training
--------
The training is pretty simple. Just follow the following steps:
1. Put the training PDB or FASTA files in a folder
2. Run getPDBPKL.py to get the feature files in a features directory
3. Create an label file using getExamplesDBD.py and put the labels file in the features directory
4. Create a Kernel file using dbdKernels.py (if you want to use FASTA files Rename protKernel_seq.py to protKernel.py, otherwise Rename protKernel_str.py to protKernel.py before running dbdKernels.py)
5. Use the generated kernel files for testing
6. You can also get the cross validation results for the generated kernel using dbdscrpp3.py. nested cross validation can be performed using nested_cv.py. If you want to shuffle the folds in CV, run run cv_shuffle_folds.py.

Visualization
-------------
Visualization of the prediction file can be performed using analyzePredFile.py.
In order to show the predictions onto a 3D Pymol structure, please see makeDistanceLinks.py

File Formats
============
Prediction file format
----------------------
Please see testSingleComplex_par.py for the file format. If you're looking for a reader of these files or visualization, please see analyzePredFile.py.
