Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we …
Les graphes orientés acycliques attribués peuvent être utilisés dans beaucoup de domaines applicatif. Dans ce papier, nous étudions un nouveau domaine de motif pour permettre leur analyse : les chemins pondérés fréquents. Nous proposons en …
L’extraction de motifs fréquents est une tâche importante en fouille de données. Initialement centrés sur la découverte d’ensembles d’items fréquents, les premiers travaux ont été étendus pour extraire des motifs structurels comme des séquences, des …
This paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of …
During the last decade, several clustering and association rule mining techniques have been applied to identify groups of co-regulated genes in gene expression data. Nowadays, integrating biolog- ical knowledge and gene expression data into a single …
GENMINER is a smart adaptation of closed itemsets based association rules extraction to genomic data. It takes advantage of the novel NORDI discretization method and of the CLOSE [27] algorithm to efficiently generate min- imal non-redundant …
Microarray technology produces vast amounts of data by measuring simultaneously the expression levels of thousands of genes under hundreds of biological conditions. Nowadays, one of the principal challenges in bioinfor- matics is the interpretation …
La technologie des biopuces permet de mesurer les niveaux d’expression de milliers de gènes dans différentes conditions biologiques générant ainsi des masses de données à analyser. De nos jours, l’interprétation de ces volumineux jeux de données à la …
Using several analyse techniques for the hierarchical clustering of a SAGE expression dataset of 822 tags from 74 tissue samples (normal and cancer) we show that cleaning the dataset (tags and experiments) is critical and that attribution of a tag to …
Protein and DNA sequence comparison is one of the most important tool of molecular biologists, but sequence databases are growing at an exponential rate, and sequence comparison is becoming increasingly computationally intensive. We propose to …