Complex Network Mining

Last updated on Jun 24, 2023

A gene interaction network

Active since 2015

Research rationale

Network data modeling has emerged in various disciplines as a unified way of representing complex relational data. Formally, these complex networks (which we call multidimensional networks) are multigraphs for which nodes and edges are (multi-)labeled. The core of our research activity focuses on analyzing these complex networks for information extraction purposes ¹.

Results

Prediction of microRNA-disease associations

A microRNA (miRNA) is a small RNA molecule that, by its ability to regulate gene expression, plays a critical role in many physiological processes. Since its discovery, a great deal of information has been gained about its involvement in disease development and drug resistance. However, there is still much to be done to gain a full understanding of the miRNA world. A challenge for miRNA research is establishing a clear relationship between miRNA dysregulation, target dysregulation and ultimate biological impact. Computational methods can make an important contribution to this goal.

To this end, we have been working on a new method to predict associations between miRNAs and diseases. Its principle involves representing miRNAs and their links with elements highlighting various facets of these molecules (targeted genes, neighboring miRNAs, terms associated with them in scientific articles) in the form of a multidimensional network and then projecting this network into a vector space in order to use metrics within this space to predict MiRNA-disease associations. The performance of our algorithm, MiRAI, was characteristic of an excellent classifier and corresponded to the state of the art in the field². Subsequently, we proposed an improvement by using a parallel surrogate-assisted evolutionary algorithm to automatically find an optimal configuration of our predictive method³.

Study of triplex topology

It has been known since the 1960s that some short RNA sequences are likely to match particular areas of DNA to form triple-stranded structures called triplex DNA. We have undertaken an in silico study to locate, quantify and analyze triplex DNA on a genome in order to increase our knowledge of these structures. Our analyses, which identified many potential triplex sites within the genes, strongly suggest that some RNA fragments, coding or not, could have a significant influence on many chromosomal loci for large-scale genetic or epigenetic controls. This study paves the way for a new possible pathway for genetic regulation through RNA fragments⁴.

Network of genetic interactions via lncRNA:DNA triplex formation highlighting 5 sub-networks corresponding to distinct processes. More details can be found in our article on triplex analysis⁴.

Computational analysis of double-stranded RNA

RNA interference (RNAi) refers to a conserved post-transcriptional mechanism for the degradation of RNA by short double-stranded RNAs (dsRNAs). A genome-wide analysis of mRNAs that are complementary to RNAs was performed through computational searches in the Drosophila model. We report segments originating from pre-mRNAs introns and exons as well as lncRNAs as potential sources of siRNAs. The computationally predicted interactions have been modeled as a network in which we have noted that the central genes (those potentially most regulated by RNA interference) are strongly involved in the processes of development, morphogenesis and neurogenesis. The distribution of the genes for which transcripts are engaged in intermolecular segmental pairing is largely lacking in the gene collections defined as showing no expression in each individual developmental stage from early embryos to adulthood. This trend was also observed for the genes showing very low expression from the 8-12-hour embryonic to larval stage 2. These results suggest a genome-wide scale of mRNA homeostasis via RNAi metabolism and could extend the known roles of canonical miRNAs and hairpin RNAs⁵. The same approach has been use to analyse the potential interactions between SARS-CoV-2 genome and human RNAs⁶.

Network of RNA-RNA interactions in which central genes are involved in development, morphogenesis and neurogenesis processes.

Active module identification

The identification of condition specific gene sets from transcriptomic experiments has important biological applications, ranging from the discovery of altered pathways between different phenotypes to the selection of disease-related biomarkers. Statistical approaches using only gene expression data are based on an overly simplistic assumption that the genes with the most altered expressions are the most important in the process under study. However, a phenotype is rarely a direct consequence of the activity of a single gene, but rather reflects the interplay of several genes to perform certain molecular processes. We are working on different approaches to analyze gene activity in the light of our knowledge about their molecular interactions. These include a population-based meta-heuristics based on new crossover and mutation operators ⁷ as well as methods based on network embedding⁸ ⁹. The methods developed have been applied to examinate the importance of phosphorylation in coordinating large networks of interactive proteins, as well as exploring the interconnected landscape of phosphorylation within these networks¹⁰. Additionally, these methods revealed the dynamic gene networks that contribute to the post-mating plasticity in the female Drosophila Brain¹¹.

Sentiment analysis and multi-domain transfer

Sentiment analysis consists of automatically determining the polarity (positive, negative or neutral) of documents. In this field of research, we particularly study how different polarities, depending on the domain, can be learned for the same concept. The approach we are developing consists in combining a multidimensional graph representing the semantics of terms with a method of propagation of polarities using fuzzy logic. Our method shows improved performance over the state of the art, good cross-domain generalization capabilities, and an excellent coverage¹².

Funding

Program	ARC fundation grant
Year	2017-2018
Funder	ARC
Grant name	Role of electrical remodeling of pancreatic adenocarcinoma epithelial cells in response to the micro environment
Project coordinator	Olivier Soriani

Program	PhD grant
Year	2017-2020
Funder	Université Côte d’Azur
Grant name	Multi-objective evolutionary algorithms for the identification of master regulators in pancreatic cancer
Grant recipient	Leandro Corrêa
Project coordinator	Claude Pasquier

Softwares

MIRAI: Prédiction of miRNA-disease associations
AMINE: Active Module Identification through Network Embedding

Claude Pasquier (2018). Contributions à la fouille de données complexes. Université Côte d’Azur.
↩︎
Claude Pasquier, Julien Gardès (2016). Prediction of miRNA-disease Associations with a Vector Space Model.. Scientific reports.
PDF DOI Supplemental Data Article Link Journal Site
↩︎
Denis Pallez, Julien Gardès, Claude Pasquier (2017). Prediction of miRNA-disease Associations Using an Evolutionary Tuned Latent Semantic Analysis.. Scientific reports.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Sandra Agnel, Alain Robichon (2017). The Mapping of Predicted Triplex DNA:RNA in the Drosophila Genome Reveals a Prominent Location in Development- and Morphogenesis-Related Genes. G3: Genes, Genomes, Genetics.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Sandra Agnel, Alain Robichon (2020). Transcriptome-Wide-Scale-Predicted dsRNAs Potentially Involved in RNA Homoeostasis Are Remarkably Excluded from Genes with No/Very Low Expression in All Developmental Stages. RNA Biology.
DOI Supplemental Data Article Link Journal Site
↩︎
Claude Pasquier, Alain Robichon (2021). Computational Search of Hybrid Human/SARS-CoV-2 dsRNA Reveals Unique Viral Sequences That Diverge from Those of Other Coronavirus Strains. Heliyon.
PDF DOI Article Link Journal Site
↩︎
Leandro Correa, Denis Pallez, Laurent Tichit, Olivier Soriani, Claude Pasquier (2019). Population-Based Meta-Heuristic for Active Modules Identification. Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics.
PDF DOI Article Link Conference Site
↩︎
Claude Pasquier, Vincent Guerlais, Denis Pallez, Raphaël Rapetti-Mauss, Olivier Soriani (2021). Identification of Active Modules in Interaction Networks Using Node2vec Network Embedding. bioRxiv.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Vincent Guerlais, Denis Pallez, Raphaël Rapetti-Mauss, Olivier Soriani (2023). A Network Embedding Approach to Identify Active Modules in Biological Interaction Networks. Life Science Alliance.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Alain Robichon (2021). Temporal and Sequential Order of Nonoverlapping Gene Networks Unraveled in Mated Female Drosophila. Life Science Alliance.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, Alain Robichon (2022). Evolutionary Divergence of Phosphorylation to Regulate Interactive Protein Networks in Lower and Higher Species. International Journal of Molecular Sciences.
PDF DOI Article Link Journal Site
↩︎
Claude Pasquier, family=Costa Pereira, given=Célia, prefix=da, useprefix=true, Andrea G.B. Tettamanzi (2020). Extending a Fuzzy Polarity Propagation Method for Multi-Domain Sentiment Analysis with Word Embedding and POS Tagging. 24th European Conference on Artificial Intelligence (ECAI 2020).
PDF Conference Site
↩︎