PRED-CLASS: Bioinformatics Software for Generalized Protein Classification and Genome-Wide Applications


A cascading system of hierarchical artificial neural networks (named PRED-CLASS) is presented, for the generalized classification of proteins into four distinct classes: Transmembrane, Fibrous, Globular and ‘Mixed’, from information solely encoded in their amino acid sequences. Prediction of protein tertiary structure remains a challenging problem in structural molecular biology, despite the fact that great progress has been achieved during the last few years. Over the last three decades, several computational methods have been developed for the prediction of 1D structural features of proteins from their sequence alone. Machine learning techniques (e.g. artificial neural networks) have frequently been used to mine the information ‘hidden’ in the vast amount of protein sequences resulting from completed and ongoing Genome Projects, combined with available experimental functional or structural knowledge or information. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization and to avoid overfitting the data. Capturing information from as little as 50 protein sequences (6 Transmembrane, 10 Fibrous, 13 Globular and 17 Mixed) spread along the 4 target classes, PRED-CLASS correctly predicted 371 out of a set of 387 proteins (success rate of 96%). Application of PRED-CLASS to several test sets and 30 complete proteomes, demonstrates that such a method could serve as a valuable tool in the annotation of genomic ORFs with no functional assignment or as a preliminary step in fold recognition and ‘ab initio’ structure prediction methods. Detailed results obtained on various data sets, 30 completed genomes, along with a web sever running the PRED-CLASS algorithm can be accessed over the World Wide Web at the URL:

23rd Conference of the Hellenic Society for Biological Sciences