Skip to main content
  • Short communication
  • Published:

PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables

Abstract

Protein-protein interactions (PPI) control most of the biological processes in a living cell. In order to fully understand protein functions, a knowledge of protein-protein interactions is necessary. Prediction of PPI is challenging, especially when the three-dimensional structure of interacting partners is not known. Recently, a novel prediction method was proposed by exploiting physical interactions of constituent domains. We propose here a novel knowledge-based prediction method, namely PPI_SVM, which predicts interactions between two protein sequences by exploiting their domain information. We trained a two-class support vector machine on the benchmarking set of pairs of interacting proteins extracted from the Database of Interacting Proteins (DIP). The method considers all possible combinations of constituent domains between two protein sequences, unlike most of the existing approaches. Moreover, it deals with both single-domain proteins and multi domain proteins; therefore it can be applied to the whole proteome in high-throughput studies. Our machine learning classifier, following a brainstorming approach, achieves accuracy of 86%, with specificity of 95%, and sensitivity of 75%, which are better results than most previous methods that sacrifice recall values in order to boost the overall precision. Our method has on average better sensitivity combined with good selectivity on the benchmarking dataset. The PPI_SVM source code, train/test datasets and supplementary files are available freely in the public domain at: http://code.google.com/p/cmater-bioinfo/.

Abbreviations

AP:

appearance probability

BiFC:

biomolecular fluorescence complementation

BIND:

Biomolecular Interaction Network Database

DIP:

Database of Interacting Proteins

DPI:

dual polarization interferometry

FN:

false negatives

FP:

false positives

FPR:

false positive rate

FRET:

fluorescence resonance energy transfer

HMMs:

hidden Markov models

IgG:

Immunoglobulin G

IntAct:

open source molecular interaction database

MINT:

Molecular Interactions Database

MIPS:

Mammalian Protein-Protein Interaction Database

PID:

interacting domain pairs

PPI:

protein-protein interactions

RBF:

radial basis function

ROC:

receiver operator curve

SVM:

support vector machine

TAP:

tandem affinity purification

TN:

true negatives

TP:

true positives

TPR:

true positive rate

References

  1. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S. and Sakaki, Y. Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. USA 97 (2000) 1143–1147.

    Article  PubMed  CAS  Google Scholar 

  2. Plewczynski, D. and Basu, S. AMS 3.0: prediction of post-translational modifications. BMC Bioinformatics 11 (2010) 210 DOI: 10.1186/1471- 2105-11-210.

    Article  PubMed  Google Scholar 

  3. Gharakhanian, E., Takahashi, J., Clever, J. and Kasamatsu, H. In vitro assay for protein-protein interaction: carboxyl-terminal 40 residues of simian virus 40 structural protein VP3 contain a determinant for interaction with VP1. Proc. Natl. Acad. Sci. USA 85 (1998) 6607–6611.

    Article  Google Scholar 

  4. Hu, C.D., Chinenov, Y. and Kerppola, T.K. Visualization of interactions among bZIP and Rel family proteins in living cells using bimolecular fluorescence complementation. Mol. Cell. 9 (2002) 789–798.

    Article  PubMed  CAS  Google Scholar 

  5. Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M. and Seraphin, B. A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17 (1999) 1030–1032.

    Article  PubMed  CAS  Google Scholar 

  6. Klingström, T. and Plewczynski D. Protein-protein interaction and pathway databases, a graphical review. Brief. Bioinform. (2010) DOI: 10.1093/bib/bbq064.

  7. Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U. and Eisenberg, E. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32 (2004) 449–451.

    Article  Google Scholar 

  8. Pagel, P., Kovac, S., Oesterheld, M., Brauner, B., Dunger-Kaltenbach, I., Frishman, G., Montrone, C., Mark, P., Stümpflen, V., Mewes, H.W., Ruepp, A. and Frishman, D. The MIPS mammalian protein-protein interaction database. Bioinformatics 21 (2005) 832–834.

    Article  PubMed  CAS  Google Scholar 

  9. Bader, G.D., Betel, D. and Hogue, C.W. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31 (2003) 248–250.

    Article  PubMed  CAS  Google Scholar 

  10. Aranda, B., Achuthan, P., Alam-Faruque, Y., Armean, I., Bridge, A., Derow, C., Feuermann, M., Ghanbarian, A.T., Kerrien, S., Khadake, J., Kerssemakers, J., Leroy, C., Menden, M., Michaut, M., Montecchi-Palazzi, L., Neuhauser, L.N., Orchard, S., Perreau, V., Roechert, B., van Eijk, K. and Hermjakob, H. The IntAct molecular interaction database in 2010. Nucleic Acids Res. 38 (2009) 525–531.

    Article  Google Scholar 

  11. Ceol, A., Chatr, Aryamontri, A., Licata, L., Peluso, D., Briganti, L., Perfetto, L., Castagnoli, L. and Cesareni, G. MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 38 (2010) 532–539.

    Article  Google Scholar 

  12. Plewczynski, D., Łaźniewski, M., Augustyniak, R. and Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 32 (2011) 742–755.

    Article  PubMed  CAS  Google Scholar 

  13. Plewczynski, D., Łaźniewski, M., von Grotthuss, M., Rychlewski, L. and Ginalski, K. VoteDock: Consensus docking method for prediction of protein-ligand interactions. J. Comput. Chem. 32 (2011) 568–581.

    Article  PubMed  CAS  Google Scholar 

  14. Bock, J.R. and Gough, A.D., A. Predicting protein-protein interactions from primary structure. Bioinformatics 17 (2001) 455–460.

    Article  PubMed  CAS  Google Scholar 

  15. Gomez, S.M., Noble, W.S. and Rzhetsky, A. Learning to predict protein-protein interactions from protein sequences. Bioinformatics 19 (2003) 1875–1881.

    Article  PubMed  CAS  Google Scholar 

  16. Zaki, N. Prediction of protein-protein interactions using pairwise alignment and inter-domain linker region. Engin. Letter 16 (2008) 505–511.

    Google Scholar 

  17. Wojcik, J. and Schachter, V. Protein-protein interaction map inference using interacting domain profile pairs. Bioinformatics 17 (2001) 296–305.

    Google Scholar 

  18. Kim, W.K., Park, J. and Suh, J.K. Large scale statistical prediction of protein-protein interaction by potentially interacting domain (PID) pair. Genome Inform. 13 (2002) 42–50.

    PubMed  CAS  Google Scholar 

  19. Alashwal, H., Deris, S. and Othman, R.M. One-class support vector machines for protein-protein interactions prediction. J. Biomed. Sci. 1 (2006) 120–127.

    CAS  Google Scholar 

  20. Chen, X.W. and Liu, M. Domain-based predictive models for proteinprotein interaction prediction. Eurasip Jasp. 1 (2006) 1–8. DOI: 10.1155/ASP/2006/32767.

    Google Scholar 

  21. Han, D.S., Kim, H.S., Jang, W.H., Lee, S.D. and Suh, J.K. PreSPI: a domain combination based prediction system for protein-protein interaction. Nucleic Acids Res. 132 (2004) 6312–6320.

    Article  Google Scholar 

  22. Alashwal, H., Deris, S. and Othman, R.M. A Bayesian kernel for the Prediction of Protein-Protein Interactions. World Academy of Science, Engineering and Technology 51 (2009) 928–933.

    Google Scholar 

  23. Vapnik, V. The nature of statistical learning theory, Springer-Verlag, New York, 1995.

    Google Scholar 

  24. Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M. and Eisenberg, D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30 (2002) 303–305.

    Article  PubMed  CAS  Google Scholar 

  25. Joachims, T. Making Large-Scale SVM Learning Practical. in: Advances in Kernel Methods — Support Vector Learning (Schölkopf, B., Burges. C. and Smola. A., Eds.), MIT Press Cambridge, 1999, 169–284.

    Google Scholar 

  26. Plewczynski, D. and Ginalski, K. The interactome: Predicting the proteinprotein interactions in cells. Cell. Mol. Biol. Lett. 14 (2009) 1–22.

    Article  PubMed  CAS  Google Scholar 

  27. Plewczynski D. Brainstorming: weighted voting prediction of inhibitors for protein targets. J. Mol. Model. (2010) DOI 10.1007/s00894-010-0854-x.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Plewczynski.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterjee, P., Basu, S., Kundu, M. et al. PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables. Cell Mol Biol Lett 16, 264–278 (2011). https://doi.org/10.2478/s11658-011-0008-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2478/s11658-011-0008-x

Key words