2008: mining association rules from gene expression data and annotations.
Data associated with the article:
Martinez, R., Pasquier, N., and Pasquier, C. (2008b), “GenMiner: mining non-redundant association rules from integrated gene expression data and annotations.” Bioinformatics (Oxford, England), Oxford Academic, 24, 2643–4.
Software
The software can be downloaded here.
Data sources
Gene expression measures are those used by Eisen et al. (Cluster analysis and display of genome-wide expression patterns, PNAS December 8, 1998 vol. 95 no. 25). This dataset is discretized using the NorDi algorithm at a 95% confidence level.
Gene annotations were collected from the following sources:
- Gene Ontology: terms were retrieved from http://archive.geneontology.org, local copy of the version of August 2007 ; annotations were retrieved from http://www.geneontology.org local copy of the version of August 2007.
- Literature: associations between yeast genes and pubmed ids were retrieved from literature curation data at http://www.yeastgenome.org/ local copy of the version of August 2007.
- Patways: information concerning the metabolic pathways in which each gene is involved was retrieved from KEGG local copy of the version of August 2007.
- Phenotype: annotations were retrieved from http://www.yeastgenome.org/ local copy of the version of August 2007.
- transcriptional regulators: the information of transcriptional regulators that bind to promoter regions were extracted from http://younglab.wi.mit.edu/regulatory_network/ using a p-value threshold of 0.0005 local copy of the version of August 2007.
Processed Data files
Eisen dataset
Expression ratios of 2465 Yeast genes under 79 biological conditions.
Microarray Experiments
Description of the 79 experiments.
Cutoffs
Under-expressed and over-expressed cutoff thresholds computed by NorDi.
Discretized expression measures
Discretization of expression measures performed by Nordi.
Data mining context
Data matrix of 2465 lines (genes) and 737 columns (discretized expression levels and annotations). Each line contains expression profiles over the 79 biological conditions (values discretized by NorDi) and at most 658 gene annotations (24 GOSlim terms, 14 pathways, 25 transcriptional regulators, 14 phenotypes and 581 pubmed IDs).
Equivalence classes
Frequent closed itemsets and their generators extracted by Close with a minsupport of 0.005. Each class if represented by a line of the form
[Generator] [Closed itemset] n
where ‘n’ is the number of items in the class.
Exact associations rules
All exact association rules displayed in the form
[antecedent] => [consequent] supp=s conf=c
where ’s' and c are the support and the confidence of the rule respectively.
Approximate associations rules
All approximate association rules, with a confidence greater or equals to 0.5, displayed in the form
[antecedent] -> [consequent] supp=s conf=c
where ’s' and c are the support and the confidence of the rule respectively