1

Frequent Pattern Mining in Attributed Trees

Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we …

Extraction de motifs condensés dans un unique graphe orienté acyclique attribué

Les graphes orientés acycliques attribués peuvent être utilisés dans beaucoup de domaines applicatif. Dans ce papier, nous étudions un nouveau domaine de motif pour permettre leur analyse : les chemins pondérés fréquents. Nous proposons en …

Extraction de motifs fréquents dans des arbres attribués

L’extraction de motifs fréquents est une tâche importante en fouille de données. Initialement centrés sur la découverte d’ensembles d’items fréquents, les premiers travaux ont été étendus pour extraire des motifs structurels comme des séquences, des …

Single Document Keyphrase Extraction Using Sentence Clustering and Latent Dirichlet Allocation

This paper describes the design of a system for extracting keyphrases from a single document The principle of the algorithm is to cluster sentences of the documents in order to highlight parts of text that are semantically related. The clusters of …

Mining Association Rule Bases from Integrated Genomic Data and Annotations

During the last decade, several clustering and association rule mining techniques have been applied to identify groups of co-regulated genes in gene expression data. Nowadays, integrating biolog- ical knowledge and gene expression data into a single …

GenMiner: Mining Informative Association Rules from Genomic Data

GENMINER is a smart adaptation of closed itemsets based association rules extraction to genomic data. It takes advantage of the novel NORDI discretization method and of the CLOSE [27] algorithm to efficiently generate min- imal non-redundant …

Interpreting Microarray Experiments via Co-Expressed Gene Groups Analysis

Microarray technology produces vast amounts of data by measuring simultaneously the expression levels of thousands of genes under hundreds of biological conditions. Nowadays, one of the principal challenges in bioinfor- matics is the interpretation …

Analyse des groupes de gènes co-exprimés (AGGC): un outil automatique pour l'interprétation des expériences de biopuces

La technologie des biopuces permet de mesurer les niveaux d’expression de milliers de gènes dans différentes conditions biologiques générant ainsi des masses de données à analyser. De nos jours, l’interprétation de ces volumineux jeux de données à la …

Exploratory Analysis of Cancer SAGE Data

Using several analyse techniques for the hierarchical clustering of a SAGE expression dataset of 822 tags from 74 tissue samples (normal and cancer) we show that cleaning the dataset (tags and experiments) is critical and that attribution of a tag to …

Distributed BLAST with ProActive

Protein and DNA sequence comparison is one of the most important tool of molecular biologists, but sequence databases are growing at an exponential rate, and sequence comparison is becoming increasingly computationally intensive. We propose to …