PyDock Tutorial =============== Introduction ------------ By reading this tutorial you will learn how to use PyDock [Cheng_2007]_ software to perform a protein-protein docking on a real case from a CAPRI experiment [Janin_2003]_, Target26 [Grosdidier_2007]_ (further information in :ref:`T26-info-section` section). The workflow you will follow, which coincides with a real data pipeline, consists in: 1. Obtaining the top 100 docking solutions using pyDock scoring energy. 2. Applying experimental data restraints to refine the previous complexes. 3. Interface prediction based on desolvation energy (Optimal Docking Area [Fernandez_2005]_) to characterize different properties of the complex. 4. Choosing the best model from the pool of docking solutions taking into account the results from previous steps. .. _T26-info-section: Target26 experimental information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Target 26 [Ray_2000]_ consists on the prediction of a complex from 2 unbound subunits: TolB and Pal. These proteins from E.Coli form a complex involved in maintaining the bacteria outer membrane stability. **Experimental data available from:** **Ray MC, Germon P, Vianney A, Portalier R, Lazzaroni JC:** **Identification by genetic suppression of Escherichia coli TolB residues important for TolB-Pal interaction.** **J Bacteriol 2000; 182: 821-824.** “(...) The Tol-Pal system of Escherichia coli is involved in maintaining outer membrane stability acting as a barrier to the entry of macromolecules into the bacteria, thus providing protection against deleterious actions of bacteriocins and digestive enzymes. The periplasmic protein TolB was shown to interact with the outer membrane, peptidoglycan-associated proteins OmpA, Lpp, and Pal (4, 7). Thus, TolB and Pal could be part of a multicomponent system linking the outer membrane to peptidoglycan. The aim of this study was to determine the regions of TolB involved in the interaction of the protein with Pal. To this end, we used suppressor genetic techniques which had previously allowed us to characterize the regions of interaction between TolQ, TolR, and TolA (10, 18). pal point mutations were identified, and some of them involved residues important for interaction with TolB (7). These mutations induce sensitivity to sodium cholate and release of periplasmic proteins in the medium. We used these pal mutants to search for suppressors in tolB. (...) **Isolation of extragenic suppressor mutations of pal A88V in tolB.** Twelve mutations affecting 11 different residues of tolB were isolated as suppressor mutations of pal A88V (Table 2). They enabled the pal A88V mutant to grow on plates containing sodium cholate and lowered its excretion of periplasmic enzymes, some mutants being more efficient than others in suppressing the pal A88V phenotype. In most cases, the tolB mutations could not suppress the phenotypes of tolerance to colicins A and E2 of mutant pal A88V. Three tolB point mutations (H246Y, A249V, and T292I) affected the activity of TolB, whereas the others had phenotypes similar to the wild type. All the extragenic suppressor mutations of pal A88V are located in the C-terminal region of TolB. This suggests that this region of TolB is important for its interaction with Pal. (...) **Isolation of intragenic suppressor mutations of pal A88V. Mutations pal S99F and pal** E102K were both isolated as intragenic suppressor mutations of pal A88V. The pal E102K mutations was previously described as a pal-defective mutant (7). Both pal S99F and pal E102K mutations enabled the pal A88V mutant to grow in the presence of sodium cholate and lowered its excretion of periplasmic enzymes, mutant pal E102K being more efficient than mutant pal S99F as a suppressor mutation (Table 1). Thus, the conformation of the region from residues 88 to 102 appeared to be important for Pal function. (...)” PyDock general syntax ^^^^^^^^^^^^^^^^^^^^^ All pyDock jobs are launched as follows: :: pydock3 docking_name moduleNAME In our example, *docking_name* is arbitrarily chosen by the user, we will use as *docking_name*: T26 (as it was the Target 26 in the CAPRI competition). The different modules, that can be used in pyDock, appear in the next table: +------------------------+--------------------------+-----------------------+--------------------------------+ | |**module_name** |**Input files** |**Output files** | | | | | | | | | | | | | | | | +------------------------+--------------------------+-----------------------+--------------------------------+ | | setup |docking_name.ini |docking_name_rec.pdb | | | | |docking_name_lig.pdb | | | | |docking_name_rec.pdb.H [#]_ | | | | |docking_name_lig.pdb.H | | | | |docking_name_rec.pdb.amber [#]_ | |**Docking** | | |docking_name_lig.pdb.amber | | +--------------------------+-----------------------+--------------------------------+ | | ftdock (or zdock) |docking_name_rec.pdb |docking_name.ftdock \(or | | | |docking_name_lig.pdb |docking_name.zdock\) | | | | | | | +--------------------------+-----------------------+--------------------------------+ | |rotftdock \(or rotzdock\) |docking_name.ftdock \ |docking_name.rot | | | |(or docking_name.zdock)| | | | | | | | +--------------------------+-----------------------+--------------------------------+ | |dockser |docking_name_rec.pdb |docking_name.ene | | | |docking_name_lig.pdb | | | | |docking_name.rot | | | | | | | | | | | | +------------------------+--------------------------+-----------------------+--------------------------------+ | |dockrst |docking_name.ini |docking_name.eneRST | | | |docking_name_rec.pdb |docking_name.rst | | | |docking_name_lig.pdb | | | | |docking_name.rot | | | | |docking_name.ene | | | +--------------------------+-----------------------+--------------------------------+ |**Complementary tools** |patch |docking_name_rec.pdb |docking_name.recNIP | | | |docking_name_lig.pdb |docking_name.rec.pdb.nip | | | |docking_name.rot |docking_name.ligNIP | | | |docking_name.ene |docking_name.lig.pdb.nip | | +--------------------------+-----------------------+--------------------------------+ | |oda |subunitName.pdb |subunitName.pdb.oda | | | | |subunitName.pdb.oda.H | | | | |subunitName.pdb.oda.ODAtab | | | | |subunitName.oda.amber | | | | | | | | | | | +------------------------+--------------------------+-----------------------+--------------------------------+ .. [#] .H files contain hydrogens atoms. .. [#] .amber files include AMBER force-field information of a given atom. Practical guide --------------- Setup process ^^^^^^^^^^^^^ The setup is the first step as it generates the pdb files pyDock will use for the docking job. At this point, you must define a receptor and a ligand in your complex. In general, the biggest of the two partners in a protein complex is define as receptor to be kept static whereas the ligand will be rotated and translated around the receptor. First of all, you must have in your starting directory: - 1C5K.pdb (TolB or the receptor protein) - 1OAP.pdb (Pal or the ligand protein) - T26.ini which is the input file you must edit to run the setup process You may get the pdb files from the PDB site: http://www.pdb.org The “.ini” file contains the information about the chains to dock from each pdb file, in order to create a new pair of parsed pdb files suitable for pyDock. The “mol” chain name is the original chain name in the considered pdb file, whereas “newmol” means for the new chain name in the pyDock output files “T26_rec.pdb” and “T26_lig.pdb”. The “newmol” chain names must be different for the receptor and the ligand. You can download an incomplete T26.ini text file from the following :download:`link <./_static/tutorial_data/T26/T26.ini>`. T26.ini will contain the following information: :: [receptor] pdb = 1C5K.pdb mol = A newmol = [ligand] pdb = 1OAP.pdb mol = newmol = It is now time to fill the mol and newmol fields according to your original pdbs chain names. Do not forget that the new chain names “newmol” must be different for the receptor and ligand! **Remarks:** If a pdb does not contain any chain name, use “-” in the “mol” field of your .ini file. If it contains several copies of the same protein, select only one copy by its chain name. Once you have a complete T26.ini file, run the pyDock setup writing the following line in your console: :: pydock3 T26 setup Now you can check that the different output files have been correctly created. Sampling using Fast Fourier Transform (FFT) methods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pyDock can be applied to score rigid-body docking orientations generated by a variety of methods. We use Zdock or FTDock (two FFT methods) to generate docking positions from T26_rec.pdb and T26_lig.pdb files. Because of time limitation, we have computed the sampling output files to skip this step. We include here the running commands for completeness: :: pydock T26 ftdock (to use FTDock) pydock T26 zdock (to use Zdock) **Remarks:** PyDock is able to use FTDock and ZDock external binaries via the **pydock.conf** configuration file found in etc/ folder. Remember to change both paths in the configuration file according to your installation. Then, output files from ZDock or FTDock applications are used to build a rotation and translation matrix (the *.rot* file) which will be used by pyDock intertally to generate the different complex conformations. The number of rotations and translations considered will vary depending on the number of rotations generated by ZDock or FTDock, but because of time limitation, we will use only the first 100 in this tutorial. You will find the FTDock output file already done in this :download:`link <./_static/tutorial_data/T26/T26.ftdock>`. This file correspond to the top-100 FTDock solutions (we kept from the 10000 FTDock output poses the top-100 only because of time limitation). Then, type the following command to generate the matrix: :: pydock3 T26 rotftdock This calculation is quite fast and will create a “T26.rot” file containing the whole transformation matrix mentioned above. This “.rot” file contains the transformation matrix for 100 different poses. Scoring using the pydock energy function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Next stage is to use pyDock energy function to score and rank all positions by running dockser module with the following command: :: pydock3 T26 dockser This step typically last several hours using the default one-core module, here we are using a rot file with a selection of docking poses for the sake of speedness. As computation will last several minutes, let's play with one of our homemade tools called ODA (Optimal Docking Area) for finding potential binding sites and go directly to :ref:`ODA-section` section. When dockser finishes, take a look to output file called “T26.ene” that will look like the following example with different values: :: Conf(1) Ele(2) Desolv(3) VDW(4) Total(5) RANK(6) --------------------------------------------------------------------------------------- 8726 -28.979 -9.712 130.111 -38.691 1 4538 -28.001 -8.980 38.482 -36.981 2 6446 -29.716 -4.215 96.438 -33.931 3 1590 -32.394 0.109 28.699 -32.285 4 - Conf(1): Column containing the conformation number of the docking pose as in the rot file (last column). - Ele(2): Electrostatic energy component. - Desolv(3): Desolvation energy component. - VDW(4): Van der Waals energy component (term weighted to 0.1 by default). - Total(5): Total binding energy (representing the sum of the 3 previous energies, global energy excluding VDW can also be computed). - RANK(6): conformation rank according to its computed binding energy. Addition of experimental data restraints to guide docking ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In order to help you in choosing the best models from the starting pool of solutions, you can add to the pyDock energy ranking, experimental data restraints coming from literature. You can go back to the :ref:`T26-info-section` and select, from Ray et al. mutational analysis experiments, the residues you think relevant to apply restraints as we did in the real CAPRI competition. **Remarks:** A restraint defined from a given putative interface residue is considered satisfied when the center of coordinates of its side-chain lies within a distance cutoff of 6 Å from any non hydrogen atom of the partner molecule. For each docking solution, the percentage of satisfied restraints is converted to pseudo-energy (just by multiplying by -1.0) and added to the final scoring function in the “.eneRST” file. Experimental data restraints must be included on a new line of the “.ini” file as follows: :: restr = A.Arg.45 The “restr” keyword indicates to pyDock the distance retraint(s) to be used. The restraint itself must be defined as a combination of three fields separated by dots: the first field corresponds to the new chain name ("newmol"), the second field is the 3 letter amino-acid code (be careful, the first letter is in uppercase) and the last field is the amino acid number, as it appear in the original pdb file. When more than one residue is selected to apply restraints, they must be separated by comas without spaces in between as in the following example: :: [receptor] pdb = 1C5K.pdb mol = A newmol = A restr = A.His.246,A.Ala.249 [ligand] pdb = 1OAP.pdb mol = A newmol = B restr = B.Ala.88 Be careful, this example is only indicative to understand how restraint(s) must be included before running the pyDockRST module. You can enter as above the experimental restraint(s) of your choice. To run the corresponding module, type: :: pydock3 T26 dockrst pyDockRST should last several minutes. Once you have your “.eneRST” and “.rst” files, take a look at them and appreciate how the experimental restraints influenced the ranking of the first solutions. The “.eneRST” file is the “.ene” energy file combined with the percentage of satisfied restraints multiplied by -1 and ranked according to this new score. Interface prediction from docking results ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The interface prediction from docking results [Fernandez_2004]_, [Grosdidier]_ can be done with the *patch* module. This tool gives an idea about the binding interface localization considering the top 100-solutions in pyDock ranking. Of course, this module is relevant when you are working with hundred or thousands of different poses as it reflects the convergence of the top-100 solutions. Here, we will use it on our 100 solutions pool to determine if our poses have converged around what we will consider the correct interface determined by the Ray et al. mutational experimental data. Use the following line to run the “patch” module that will last only a few minutes: :: pydock3 T26 patch Take a look to the output files: T26.recNIP, T26_rec.pdb.nip, T26.ligNIP and T26_lig.pdb.nip. The “.ligNIP” and “.recNIP” files contain the list of ligand and receptor residues with its corresponding NIP (Normalized Interface Propensity) values. The “_rec.pdb.nip” and “_lig.pdb.nip” files are pdb files in which the B-factor column is filled with NIP values allowing an easy visualization of the results with molecular graphics programs. **Remarks:** - The NIP (Normalized Interface Propensity) value represent the frequency of a given residue to be located at the interface among the 100 lowest energy solutions of docking. - If NIP = 0, the corresponding residue appear at the interface on the top-100 solutions as much as expected by random. - If NIP < 0, the corresponding residue appear at the interface on the top-100 solutions less than expected by random. - If NIP > 0.4, the corresponding residue is considered as predicted to belong to the interface as it appear significantly more than expected by random. .. _ODA-section: Optimal Docking Area (ODA) analysis or Interface prediction from protein surface desolvation energy ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This tool is used to analyze the optimal desolvation patch on a protein surface to predict potential binding interface sites. Given that in our case, Pal and TolB proteins are part of a much bigger multi-protein complex, the existence of more than one binding region on each of these proteins is more than probable. Consequently, ODA results must be taken carefully as predicted residue(s) may belong to any existing binding interface of both proteins. Run the ODA module for the TolB protein as follows: :: pydock3 1C5K.pdb oda and then, once you obtain your 1C5K output files, you can run ODA for the Pal protein: :: pydock3 1OAP.pdb oda Take a look to the output files: 1C5K.pdb.oda and 1C5K.pdb.oda.ODAtab. The ".pdb.oda.ODAtab" file contains the computed desolvation, ODA radius and ODA values for each of the subunit residues. The ".pdb.oda" file is a pdb file in which the B-factor column is filled with ODA values allowing an easy visualization of the results with molecular graphics programs. **Remark:** Regions predicted to belong to an interface have an ODA value below -10. Bibliography ------------ .. [Cheng_2007] Cheng TM, Blundell TL, Fernández-Recio J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein-protein docking. Proteins. 2007 Aug 1;68(2):503-15. .. [Janin_2003] Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJE, Vajda S, Vakser I, Wodak SJ. CAPRI: A Critical Assessment of PRedicted Interactions Proteins. Proteins. 2003 Jul (1):2-9. .. [Grosdidier_2007] Grosdidier S, Pons C, Solernou A, Fernández-Recio J. Prediction and scoring of docking poses with pyDock. Proteins. 2007 Dec 1;69(4):852-8. .. [Fernandez_2005] Fernández-Recio J, Totrov M, Skorodumov C, Abagyan R. Optimal docking area: a new method for predicting protein-protein interaction sites. Proteins. 2005 Jan 1;58(1):134-43. .. [Ray_2000] Ray MC, Germon P, Vianney A, Portalier R, Lazzaroni JC. Identification by genetic suppression of Escherichia coli TolB residues important for TolB-Pal interaction. J Bacteriol. 2000; 182:821-824. .. [Fernandez_2004] Fernández-Recio J, Totrov M, Abagyan R. Identification of protein-protein interaction sites from docking energy landscapes. J Mol Biol. 2004 Jan 16;335(3):843-65. .. [Grosdidier] Grosdidier S, Fernández-Recio J. Identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinformatics In Press.