Mining Association Rule Bases from Integrated Genomic Data and Annotations

Ricardo Martinez, Nicolas Pasquier, Claude Pasquier

October 2008

Abstract

During the last decade, several clustering and association rule mining techniques have been applied to identify groups of co-regulated genes in gene expression data. Nowadays, integrating biolog- ical knowledge and gene expression data into a single framework has become a major challenge to im- prove the relevance of mined patterns and simplify their interpretation by the biologists. The GenMiner approach was developed for mining association rules showing gene groups that are both co-expressed (sharing similar expression profiles) and co-annotated (sharing the same annotations such as function, regulatory mechanism, etc.) from such integrated datasets. It combines a new nomalized discretization method, called NorDi, and the Close algorithm to extract minimal non-redundant association rules only. Compared with classical Apriori based approaches, GenMiner improves the extraction applicability for these datasets and reduces the number of association rules by suppressing redundant rules that are un- informative and useless. We present a new Java implementation of GenMiner and experimental results obtained from microarray datasets with integrated biological knowledge (bio-ontologies, descriptions of regulation pathways and literature). These results show that GenMiner requires less memory than Apri- ori based approaches and that it improves the relevance of extracted rules. Moreover, association rules obtained revealed significant co-annotated and co-expressed gene patterns showing important biological relationships supported by recent biological literature.

Type

Conference paper

Publication

5th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB'08)