Protein Structure Prediction

Example of predicted transmembrane segment topologies for three proteins.

Active from 1997 to 1999

Research rationale

Membrane proteins are involved in a wide range of essential functions, including the communication between cells and the transport of nutrients, ions, and waste products across biological membranes. These proteins, that are estimated to constitute 25% of proteins at a genomic scale, play key roles in an equally wide range of diseases like diabete, hypertension, depression, arthritis and cancer. They are also common drug targets (for over 75% of pharmaceuticals in use today). Determining membrane protein structures is essential for the understanding of how drugs interfere with cellular communication and regulation. However, current knowledge about the detailed 3D structures of membrane proteins is limited, because such protein structures are difficult to study by traditional experimental methods.

The idea is to use computational techniques to enhance our knowledge about membrane proteins. However, developing algorithms that are capable of predicting the three-dimensional structure of proteins at atomic detail is a very difficult task. Instead of tertiary structure determination, we focused our research on two complementary aspects: the structural classification of proteins, which allows to identify potential membrane proteins, and the prediction of transmembrane alpha-helices in membrane proteins.


Structural classification of proteins

Periodical patterns and tandem repeats of residues are often found in DNA and protein sequences. In proteins, their presence helps towards an understanding of the molecular structure of a fibrous/structural protein employing the principle of conformational equivalence and it may suggest ways of ultramolecular assembly for the formation of higher order structure. Characteristic examples are periodicities found in a number of sequences of fibrous proteins (e.g. tropomyosin, myosin, keratins and collagen). We used the Fourier analysis method to highlight hidden periodicities in protein sequences and developped FT, a tool accessible by biologists through the Internet1 2.

In the continuation of this work, we explored the use of hierarchical, artificial neural networks for the generalized classification of proteins into several distinct classes - transmembrane, fibrous, globular, and mixed - from information solely encoded in their amino acid sequences 3 4 5. The use of our implementations (PRED-TMR2 and PRED-CLASS) to analyze various test sets and complete proteomes of several organisms demonstrates that such methods could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods 6 7 8 9 10 11.

Prediction of transmembrane alpha-helices in membrane proteins

The successful location of transmembrane segments, of their secondary structure and the packing modes of secondary structure elements is important because they define the architecture of a transmembrane protein. However, equally important is the determination of topology, which defines the polarity of integral membrane proteins.

Researchers have identified several characteristics that are common to a large proportion of transmembrane segments. They observed, for example, that transmembrane segments are mainly composed of hydrophobic residues, and that the propensity of positively charged residues is higher in the non-transmembrane segments on the inner part of the cell, also that a high propensity of tyrosine and tryptophan indicates the outer part of the cell. To enhance this knowledge, we performed several statistical analysis of known transmembrane segments to find other characteristics of transmembrane parts. We determined, among other things, the distribution of transmembrane segment length, the propensity for each amino acid to be in a transmembrane region and the precise profiles of potential termini (“edges”, starts and ends) of transmembrane regions [@Pasquier1999a]. We combined this information with several scoring functions to predict the precise position of transmembrane segments and their topology 12 13. The accuracy of our method compares well with that of other popular existing methods. This work led to the implementation of several tools freely available on the Internet: PRED-TMR, OrienTM, CoPreTHi and Dam-Bio.


ProgramTraining and Mobility of Researchers (TMR)
FunderEuropean Economic Community (EEC)
Grant nameEEC-TMR “GENEQUIZ”, Integrated Software System for Molecular Biologists
Grant idERBFMRXCT960019
Project coordinatorChris Sanders


  • COPRETHI: Ensemble learning to predict transmembrane segments in proteins
  • DAM-BIO: Integrated environment designed to support protein sequence and structure analysis on the Web
  • DB-NTMR: Database of non transmembrane regions automatically extracted from the SwissProt database
  • DB-TMR: Database of transmembrane regions automatically extracted from the SwissProt database
  • FT: Analysis of periodic patterns in amino acid or DNA sequences by Fourrier transform
  • ORIENTM: Topology prediction of transmembrane proteins and segments
  • PRED-CLASS: System of cascading neural networks that classifies any protein into one of four possible classes: membrane, globular, fibrous, mixed
  • PRED-TMR: Prediction of transmembrane domains in proteins
  • PRED-TMR2: Identification of transmembrane proteins and prediction of their transmembranle domains

  1. ↩︎

  2. ↩︎

  3. ↩︎

  4. ↩︎

  5. ↩︎

  6. ↩︎

  7. ↩︎

  8. ↩︎

  9. ↩︎

  10. (2013). Frequent Pattern Mining in Attributed Trees. 17th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD'13), J. Pei et al. (Eds.): PAKDD 2013, Part I, LNAI 7818, Pp. 26–37. Springer, Heidelberg (2013).



  11. ↩︎

  12. ↩︎

  13. ↩︎

Claude Pasquier
Claude Pasquier
Researcher in Computer Science / Computational Biology

Université côte d’Azur, CNRS, I3S Laboratory, Sophia Antipolis