sampledoc

Table Of Contents

Previous topic

PyDock Documentation

This Page

PyDock3 Tutorial

Introduction

By reading this tutorial you will learn how to use PyDock [Cheng_2007] software to perform a protein-protein docking on a real case from a CAPRI experiment [Janin_2003], Target26 [Grosdidier_2007] (further information in Target26 experimental information section).

The workflow you will follow, which coincides with a real data pipeline, consists in:

  1. Obtaining the top 100 docking solutions using pyDock scoring energy.
  2. Applying experimental data restraints to refine the previous complexes.
  3. Interface prediction based on desolvation energy (Optimal Docking Area [Fernandez_2005]) to characterize different properties of the complex.
  4. Choosing the best model from the pool of docking solutions taking into account the results from previous steps.

Target26 experimental information

Target 26 [Ray_2000] consists on the prediction of a complex from 2 unbound subunits: TolB and Pal. These proteins from E.Coli form a complex involved in maintaining the bacteria outer membrane stability.

Experimental data available from:

Ray MC, Germon P, Vianney A, Portalier R, Lazzaroni JC: Identification by genetic suppression of Escherichia coli TolB residues important for TolB-Pal interaction. J Bacteriol 2000; 182: 821-824.

“(...)

The Tol-Pal system of Escherichia coli is involved in maintaining outer membrane stability acting as a barrier to the entry of macromolecules into the bacteria, thus providing protection against deleterious actions of bacteriocins and digestive enzymes. The periplasmic protein TolB was shown to interact with the outer membrane, peptidoglycan-associated proteins OmpA, Lpp, and Pal (4, 7). Thus, TolB and Pal could be part of a multicomponent system linking the outer membrane to peptidoglycan. The aim of this study was to determine the regions of TolB involved in the interaction of the protein with Pal. To this end, we used suppressor genetic techniques which had previously allowed us to characterize the regions of interaction between TolQ, TolR, and TolA (10, 18). pal point mutations were identified, and some of them involved residues important for interaction with TolB (7). These mutations induce sensitivity to sodium cholate and release of periplasmic proteins in the medium. We used these pal mutants to search for suppressors in tolB.

(...)

Isolation of extragenic suppressor mutations of pal A88V in tolB. Twelve mutations affecting 11 different residues of tolB were isolated as suppressor mutations of pal A88V (Table 2). They enabled the pal A88V mutant to grow on plates containing sodium cholate and lowered its excretion of periplasmic enzymes, some mutants being more efficient than others in suppressing the pal A88V phenotype. In most cases, the tolB mutations could not suppress the phenotypes of tolerance to colicins A and E2 of mutant pal A88V. Three tolB point mutations (H246Y, A249V, and T292I) affected the activity of TolB, whereas the others had phenotypes similar to the wild type. All the extragenic suppressor mutations of pal A88V are located in the C-terminal region of TolB. This suggests that this region of TolB is important for its interaction with Pal.

(...)

Isolation of intragenic suppressor mutations of pal A88V. Mutations pal S99F and pal E102K were both isolated as intragenic suppressor mutations of pal A88V. The pal E102K mutations was previously described as a pal-defective mutant (7). Both pal S99F and pal E102K mutations enabled the pal A88V mutant to grow in the presence of sodium cholate and lowered its excretion of periplasmic enzymes, mutant pal E102K being more efficient than mutant pal S99F as a suppressor mutation (Table 1). Thus, the conformation of the region from residues 88 to 102 appeared to be important for Pal function.

(...)”

PyDock general syntax

Internally to differentiate between the binary and the direct use of the source code (used internally in our group). The program is called in two ways: pydock3 when we refer to the source code and pyDock3 the distributed binary. To follow this tutorial if you are using the binary, first create an alias:

alias pydock3=PATH/WHERE/PYDOCK_BIN/pyDock3

All pyDock jobs are launched as follows:

pydock3 docking_name moduleNAME

In our example, docking_name is arbitrarily chosen by the user, we will use as docking_name: T26 (as it was the Target 26 in the CAPRI competition).

The different modules, that can be used in pyDock, appear in the next table:

  module_name Input files Output files
Docking setup docking_name.ini docking_name_rec.pdb docking_name_lig.pdb docking_name_rec.pdb.H [1] docking_name_lig.pdb.H docking_name_rec.pdb.amber [2] docking_name_lig.pdb.amber
ftdock (or zdock) docking_name_rec.pdb docking_name_lig.pdb docking_name.ftdock (or docking_name.zdock)
rotftdock (or rotzdock) docking_name.ftdock (or docking_name.zdock) docking_name.rot
dockser docking_name_rec.pdb docking_name_lig.pdb docking_name.rot docking_name.ene
Complementary tools dockrst docking_name.ini docking_name_rec.pdb docking_name_lig.pdb docking_name.rot docking_name.ene docking_name.eneRST docking_name.rst
patch docking_name_rec.pdb docking_name_lig.pdb docking_name.rot docking_name.ene docking_name.recNIP docking_name.rec.pdb.nip docking_name.ligNIP docking_name.lig.pdb.nip
oda subunitName.pdb subunitName.pdb.oda subunitName.pdb.oda.H subunitName.pdb.oda.ODAtab subunitName.oda.amber
[1].H files contain hydrogens atoms.
[2].amber files include AMBER force-field information of a given atom.

Practical guide

Setup process

The setup is the first step as it generates the pdb files pyDock will use for the docking job. At this point, you must define a receptor and a ligand in your complex. In general, the biggest of the two partners in a protein complex is define as receptor to be kept static whereas the ligand will be rotated and translated around the receptor.

First of all, you must have in your starting directory:

  • 1C5K.pdb (TolB or the receptor protein)
  • 1OAP.pdb (Pal or the ligand protein)
  • T26.ini which is the input file you must edit to run the setup process

You may get the pdb files from the PDB site: http://www.pdb.org

The “.ini” file contains the information about the chains to dock from each pdb file, in order to create a new pair of parsed pdb files suitable for pyDock.

The “mol” chain name is the original chain name in the considered pdb file, whereas “newmol” means for the new chain name in the pyDock output files “T26_rec.pdb” and “T26_lig.pdb”. The “newmol” chain names must be different for the receptor and the ligand.

You can download an incomplete T26.ini text file from the following link.

T26.ini will contain the following information:

[receptor]
pdb     = 1C5K.pdb
mol     = A
newmol  =

[ligand]
pdb     = 1OAP.pdb
mol     =
newmol  =

It is now time to fill the mol and newmol fields according to your original pdbs chain names. Do not forget that the new chain names “newmol” must be different for the receptor and ligand!

Remarks: If a pdb does not contain any chain name, use “-” in the “mol” field of your .ini file. If it contains several copies of the same protein, select only one copy by its chain name.

Once you have a complete T26.ini file, run the pyDock setup writing the following line in your console:

pydock3 T26 setup

Now you can check that the different output files have been correctly created.

Sampling using Fast Fourier Transform (FFT) methods

pyDock can be applied to score rigid-body docking orientations generated by a variety of methods. We use Zdock or FTDock (two FFT methods) to generate docking positions from T26_rec.pdb and T26_lig.pdb files.

Because of time limitation, we have computed the sampling output files to skip this step. We include here the running commands for completeness:

cp T26_rec.pdb T26_rec.parsed	    (step needed in Version 3.2.3)	
cp T26_lig.pdb T26_lig.parsed       (step needed in Version 3.2.3)  
pydock3 T26 ftdock       (to use FTDock)
pydock3 T26 zdock        (to use Zdock)

Remarks: PyDock is able to use FTDock and ZDock external binaries via the pydock.conf configuration file found in etc/ folder. Remember to change both paths in the configuration file according to your installation.

Then, output files from ZDock or FTDock applications are used to build a rotation and translation matrix (the .rot file) which will be used by pyDock intertally to generate the different complex conformations.

The number of rotations and translations considered will vary depending on the number of rotations generated by ZDock or FTDock, but because of time limitation, we will use only the first 100 in this tutorial.

You will find the FTDock output file already done in this link. This file correspond to the top-100 FTDock solutions (we kept from the 10000 FTDock output poses the top-100 only because of time limitation).

Then, type the following command to generate the matrix:

pydock3 T26 rotftdock

This calculation is quite fast and will create a “T26.rot” file containing the whole transformation matrix mentioned above. This “.rot” file contains the transformation matrix for 100 different poses.

Scoring using the pydock energy function

Next stage is to use pyDock energy function to score and rank all positions by running dockser module with the following command:

pydock3 T26 dockser

This step typically last several hours using the default one-core module, here we are using a rot file with a selection of docking poses for the sake of speedness.

As computation will last several minutes, let’s play with one of our homemade tools called ODA (Optimal Docking Area) for finding potential binding sites and go directly to Optimal Docking Area (ODA) analysis or Interface prediction from protein surface desolvation energy section.

When dockser finishes, take a look to output file called “T26.ene” that will look like the following example with different values:

Conf(1)          Ele(2)         Desolv(3)       VDW(4)          Total(5)        RANK(6)
---------------------------------------------------------------------------------------
8726            -28.979         -9.712          130.111         -38.691         1
4538            -28.001         -8.980           38.482         -36.981         2
6446            -29.716         -4.215           96.438         -33.931         3
1590            -32.394          0.109           28.699         -32.285         4
  • Conf(1): Column containing the conformation number of the docking pose as in the rot file (last column).
  • Ele(2): Electrostatic energy component.
  • Desolv(3): Desolvation energy component.
  • VDW(4): Van der Waals energy component (term weighted to 0.1 by default).
  • Total(5): Total binding energy (representing the sum of the 3 previous energies, global energy excluding VDW can also be computed).
  • RANK(6): conformation rank according to its computed binding energy.

Addition of experimental data restraints to guide docking

In order to help you in choosing the best models from the starting pool of solutions, you can add to the pyDock energy ranking, experimental data restraints coming from literature. You can go back to the Target26 experimental information and select, from Ray et al. mutational analysis experiments, the residues you think relevant to apply restraints as we did in the real CAPRI competition.

Remarks: A restraint defined from a given putative interface residue is considered satisfied when the center of coordinates of its side-chain lies within a distance cutoff of 6 Å from any non hydrogen atom of the partner molecule.

For each docking solution, the percentage of satisfied restraints is converted to pseudo-energy (just by multiplying by -1.0) and added to the final scoring function in the “.eneRST” file.

Experimental data restraints must be included on a new line of the “.ini” file as follows:

restr   = A.Arg.45

The “restr” keyword indicates to pyDock the distance retraint(s) to be used. The restraint itself must be defined as a combination of three fields separated by dots: the first field corresponds to the new chain name (“newmol”), the second field is the 3 letter amino-acid code (be careful, the first letter is in uppercase) and the last field is the amino acid number, as it appear in the original pdb file. When more than one residue is selected to apply restraints, they must be separated by comas without spaces in between as in the following example:

[receptor]
pdb     = 1C5K.pdb
mol     = A
newmol  = A
restr   = A.His.246,A.Ala.249

[ligand]
pdb     = 1OAP.pdb
mol     = A
newmol  = B
restr   = B.Ala.88

Be careful, this example is only indicative to understand how restraint(s) must be included before running the pyDockRST module. You can enter as above the experimental restraint(s) of your choice.

To run the corresponding module, type:

pydock3 T26 dockrst

pyDockRST should last several minutes. Once you have your “.eneRST” and “.rst” files, take a look at them and appreciate how the experimental restraints influenced the ranking of the first solutions. The “.eneRST” file is the “.ene” energy file combined with the percentage of satisfied restraints multiplied by -1 and ranked according to this new score.

Interface prediction from docking results

The interface prediction from docking results [Fernandez_2004], [Grosdidier] can be done with the patch module. This tool gives an idea about the binding interface localization considering the top 100-solutions in pyDock ranking. Of course, this module is relevant when you are working with hundred or thousands of different poses as it reflects the convergence of the top-100 solutions. Here, we will use it on our 100 solutions pool to determine if our poses have converged around what we will consider the correct interface determined by the Ray et al. mutational experimental data.

Use the following line to run the “patch” module that will last only a few minutes:

pydock3 T26 patch

Take a look to the output files: T26.recNIP, T26_rec.pdb.nip, T26.ligNIP and T26_lig.pdb.nip. The “.ligNIP” and “.recNIP” files contain the list of ligand and receptor residues with its corresponding NIP (Normalized Interface Propensity) values. The “_rec.pdb.nip” and “_lig.pdb.nip” files are pdb files in which the B-factor column is filled with NIP values allowing an easy visualization of the results with molecular graphics programs.

Remarks:

  • The NIP (Normalized Interface Propensity) value represent the frequency of a given residue to be located at the interface among the 100 lowest energy solutions of docking.
  • If NIP = 0, the corresponding residue appear at the interface on the top-100 solutions as much as expected by random.
  • If NIP < 0, the corresponding residue appear at the interface on the top-100 solutions less than expected by random.
  • If NIP > 0.4, the corresponding residue is considered as predicted to belong to the interface as it appear significantly more than expected by random.

Visualizing patch results:

NOTE: Depending on Pymol version, files ending in .nip might not be opened. Just switch .nip extension for .pdb instead to avoid this problem.

cp T26_rec.pdb.nip T26_rec.nip.pdb
cp T26_lig.pdb.nip T26_lig.nip.pdb
You can visualize patch module results (stored in the B-factor column) in PyMol:
PyMOL> load T26_rec.nip.pdb
PyMOL> spectrum b, blue_white_red, minimum=0.0, maximum=0.2
Residues with NIP values ≤ 0.0 will appear in blue whereas NIP values ≥ 0.2 will be in red. You can do the same with the ligand molecule:
PyMOL> load T26_rec.nip.pdb
PyMOL> spectrum b, blue_white_red, minimum=0.0, maximum=0.2
You can also sort the T26.recNIP and T26.ligNIP files according to the NIP column and compare the predicted residues (NIP ≥ 0.2) localization with Ray et al. mutational data. Now you can analyze your residues of interest: If you highlight (e.g. display as spheres) the residues known to be involved in complex formation by mutational experiments (e.g. the ones you selected as restraint residues), you can visually compare them with the location of the NIP-based interface predicted residues (in red). As an example, the following command line highlights H147 and R36 of TolB (receptor):
PyMOL> show spheres, /T26_rec.nip///147+36
Then, another example, shows F52 in Pal (ligand):
PyMOL> show spheres, /T26_lig.nip///52
WARNING: These residues are shown here just as an example, you will need to indicate your own residues.

Optimal Docking Area (ODA) analysis or Interface prediction from protein surface desolvation energy

This tool is used to analyze the optimal desolvation patch on a protein surface to predict potential binding interface sites. Given that in our case, Pal and TolB proteins are part of a much bigger multi-protein complex, the existence of more than one binding region on each of these proteins is more than probable. Consequently, ODA results must be taken carefully as predicted residue(s) may belong to any existing binding interface of both proteins.

Run the ODA module for the TolB protein as follows:

pydock3 1C5K.pdb oda

and then, once you obtain your 1C5K output files, you can run ODA for the Pal protein:

pydock3 1OAP.pdb oda

Take a look to the output files: 1C5K.pdb.oda and 1C5K.pdb.oda.ODAtab. The ”.pdb.oda.ODAtab” file contains the computed desolvation, ODA radius and ODA values for each of the subunit residues. The ”.pdb.oda” file is a pdb file in which the B-factor column is filled with ODA values allowing an easy visualization of the results with molecular graphics programs.

Remark: Regions predicted to belong to an interface have an ODA value below -10.

We also can use PyMol to color each residue of the TolB protein according to its ODA calculated value in a gradient from red (ODA values ≤ -10.0) to blue (ODA values ≥ 0.0).

PyMOL> load 1C5K.pdb.oda
PyMOL> spectrum b, red_white_blue, minimum=-10.0, maximum=0.0
Repeat the same process with the Pal protein:
PyMOL> load 1OAP.pdb.oda
PyMOL> spectrum b, red_white_blue, minimum=-10.0, maximum=0.0

Bibliography

[Cheng_2007]Cheng TM, Blundell TL, Fernández-Recio J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking. Proteins. 2007 Aug 1;68(2):503-15.
[Janin_2003]Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJE, Vajda S, Vakser I, Wodak SJ. CAPRI: A Critical Assessment of PRedicted Interactions Proteins. Proteins. 2003 Jul (1):2-9.
[Grosdidier_2007]Grosdidier S, Pons C, Solernou A, Fernández-Recio J. Prediction and scoring of docking poses with pyDock. Proteins. 2007 Dec 1;69(4):852-8.
[Fernandez_2005]Fernández-Recio J, Totrov M, Skorodumov C, Abagyan R. Optimal docking area: a new method for predicting protein-protein interaction sites. Proteins. 2005 Jan 1;58(1):134-43.
[Ray_2000]Ray MC, Germon P, Vianney A, Portalier R, Lazzaroni JC. Identification by genetic suppression of Escherichia coli TolB residues important for TolB-Pal interaction. J Bacteriol. 2000; 182:821-824.
[Fernandez_2004]Fernández-Recio J, Totrov M, Abagyan R. Identification of protein-protein interaction sites from docking energy landscapes. J Mol Biol. 2004 Jan 16;335(3):843-65.
[Grosdidier]Grosdidier S, Fernández-Recio J. Identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinformatics In Press.