Attributed Graph Mining

Last updated on Dec 15, 2021

Illustration of the possible extensions of a pattern with automorphisms.

Active from 2012 to 2014

Research rationale

Graphs are well suited to model complex structures present in the real world. Because of this ubiquitousness, graphs are extensively studied in graph theory, and, more recently, in the field of data mining. Until the 2000s, most research focused either on unlabeled graphs or on graphs with nodes associated with a single label. However, in many applications, objects (represented by nodes) are associated with multiple characteristics, and these can be represented as node attributes. Graphs in which nodes are annotated with sets of attributes (or itemsets) are named attributed graphs and up to now, only few studies are devoted to their analysis.

Mining attributed graphs is very difficult because the search space is much larger than for labeled graphs. However, there is a need for effective methods that can help identify hidden structural patterns, but which can also highlight the relationship between node attributes.

Results

A new mining algorithm combining itemset extension and structural extension

The strategy for searching for frequent patterns that we have proposed is to start from a set of initial patterns composed of nodes associated with a single item and to build the search tree from the spanning tree originating from these nodes, using an order based on a code that we have defined. We proposed a complete method for navigating the search space that combines two extension types: itemset extension and structure extension. We empirically defined the notion of closure on attributed graphs by considering that an attributed graph is closed if it is not included in any other attributed graph that has the same support as it. We have also proposed two concise representations of the patterns that are defined either according to the inclusion on itemsets (c-closed patterns), or according to the inclusion on the structures (s-closed patterns). We have shown that the enumeration of c-closed patterns allows to drastically reduce both the number of returned patterns and the execution time. Tests have shown that this condensed representation offers a good compromise between speed of execution and conciseness of results¹ ² ³.

Handling of cycles and isomorphic patterns

The consideration of cycles in a graph required a special treatment of the isomorphic patterns that are inevitably generated for all explorations that start on another node that is part of the cycle. Patterns that have many subgraph isomorphisms with the analyzed pattern present difficulties for all existing algorithms because the problem of subgraph isomorphisms is NP-complete. We have proposed two optimizations that make it possible, on the one hand, to trim the search tree generated from an automorphic pattern and, on the other hand, to delete certain ways of obtaining automorphic patterns that do not allow new canonical patterns to be generated ⁴ ⁵.

A new condensed representation of weighted paths

We have addressed the problem of extracting frequent weighted paths in a single attributed directed acyclic graph (aDAG) where each weight expresses the frequency of a transition. Frequent paths are used to analyze the causal relationship between sequences of events and/or attributes. As the number of patterns can be very large, we have designed a condensed representation for such collections⁶ ⁷ ⁸.

Integrating mathematical models defined by experts into the extraction process

By noting that in many data science contexts, experts have often capitalized part of their knowledge in mathematical models, we have proposed to use these models to derive new constraints that can be used during the data mining phase to improve both pattern relevancy and computational efficiency We have defined a method of patterns mining under constraint of a modele. We also studied some properties of predicates and constraints in order to use them to optimize pattern calculations. We have shown that taking into account constraints from mathematical models makes it possible to better target analysis, while improving performance through model properties. We have thus obtained more relevant patterns, complementing or contradicting the expert knowledge on the studied phenomena ⁹ ¹⁰.

Funding

Program	ANR Program FOSTER
Year	2011-2013
Funder	ANR
Grant name	Spatio-temporal data mining: application to the understanding and monitoring of soil erosion
Grant id	ANR-10-COSI-012
Project coordinator	Nazha-Selmaoui Folcher

Softwares

AADAGE: Extraction of frequent patterns in attributed graphs
IMIT: Mining frequent patterns in attributed trees

Claude Pasquier, Jérémy Sanhes, Frédéric Flouvat, Nazha Selmaoui-folcher (2013). Frequent Pattern Mining in Attributed Trees. 17th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD'13), J. Pei et al. (Eds.): PAKDD 2013, Part I, LNAI 7818, Pp. 26–37. Springer, Heidelberg (2013).
PDF
↩︎
Claude Pasquier, Jérémy Sanhes, Frédéric Flouvat, Nazha Selmaoui-folcher (2013). Extraction de motifs fréquents dans des arbres attribués. 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'13). Revue des Nouvelles Technologies de l’Information, volume E-24.
PDF Conference Site
↩︎
Claude Pasquier, Jérémy Sanhes, Frédéric Flouvat, Nazha Selmaoui-folcher (2016). Frequent Pattern Mining in Attributed Trees: Algorithms and Applications. Knowledge and Information Systems.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Frédéric Flouvat, Jérémy Sanhes, Nazha Selmaoui-folcher (2014). Extraction de motifs dans des graphes orientés attribués en présence d’automorphisme. 14ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'14), Revue des Nouvelles Technologies de l’Information, volume E-26.
PDF Conference Site
↩︎
Claude Pasquier, Frédéric Flouvat, Jérémy Sanhes, Nazha Selmaoui-folcher (2017). Attributed Graph Mining in the Presence of Automorphism. Knowledge and Information Systems.
PDF DOI Article Link Journal Site
↩︎
Jérémy Sanhes, Frédéric Flouvat, Claude Pasquier, Nazha Selmaoui-folcher, Jean-François Boulicaut (2013). Weighted Path as a Condensed Pattern in a Single Attributed DAG. 23rd International Joint Conference on Artificial Intelligence (IJCAI'13).
PDF Conference Site
↩︎
Jérémy Sanhes, Frédéric Flouvat, Claude Pasquier, Nazha Selmaoui-folcher, Jean-François Boulicaut (2013). Extraction de motifs condensés dans un unique graphe orienté acyclique attribué. 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'13), Revue des Nouvelles Technologies de l’Information, volume E-24.
PDF Conference Site
↩︎
Frédéric Flouvat, Nazha Selmaoui-folcher, Jérémy Sanhes, Chengcheng Mu, Claude Pasquier, Jean-François Boulicaut (2020). Mining Evolutions of Complex Spatial Objects Using a Single-Attributed Directed Acyclic Graph. Knowledge and Information Systems.
DOI Article Link Journal Site
↩︎
Frédéric Flouvat, Jérémy Sanhes, Claude Pasquier, Nazha Selmaoui-folcher, Jean-François Boulicaut (2014). Improving Pattern Discovery Relevancy by Deriving Constraints from Expert Models. 21st European Conference on Artificial Intelligence (ECAI'14), Proceedings Published by IOS Press in Frontiers in Artificial Intelligence and Applications Serie Volume 263.
PDF Conference Site
↩︎
Frédéric Flouvat, Jérémy Sanhes, Claude Pasquier, Nazha Selmaoui-folcher, Jean-François Boulicaut (2014). Les modèles des experts au service de l’extraction de motifs pertinents. 19ème congrès sur la Reconnaissance de Formes et l’Intelligence Artificielle (RFIA'14).
PDF Conference Site
↩︎

Attributed Graph Mining

Active from 2012 to 2014

Research rationale

Results

A new mining algorithm combining itemset extension and structural extension

Handling of cycles and isomorphic patterns

A new condensed representation of weighted paths

Integrating mathematical models defined by experts into the extraction process

Funding

Softwares

Claude Pasquier

Researcher in Computer Science / Computational Biology

Related

Attributed Graph Mining

Active from 2012 to 2014

Research rationale

Results

A new mining algorithm combining itemset extension and structural extension

Handling of cycles and isomorphic patterns

A new condensed representation of weighted paths

Integrating mathematical models defined by experts into the extraction process

Funding

Softwares

Related publications

Claude Pasquier

Researcher in Computer Science / Computational Biology

Related