Identification of Active Modules in Interaction Networks Using Node2vec Network Embedding

Abstract

The identification of condition-specific gene sets from transcriptomic experiments is important to reveal regulatory and signaling mechanisms associated with a given cellular response. Statistical approaches using only expression data allow the identification of genes whose expression is most altered between different conditions. However, a phenotype is rarely a direct consequence of the activity of a single gene, but rather reflects the interplay of several genes to carry out certain molecular processes. Many methods have been proposed to analyze the activity of genes in light of our knowledge of their molecular interactions. However, existing methods have many limitations that make them of limited use to biologists: they detect modules that are too large, too small, or they require the users to specify a priori the size of the modules they are looking for. We propose AMINE (Active Module Identification through Network Embedding), an efficient method for the identification of active modules. Experiments carried out on artificial data sets show that the results obtained are more reliable than many available methods. Moreover, the size of the modules to be identified is not a fixed parameter of the method and does not need to be specified; rather, it adjusts according to the size of the modules to be found. The applications carried out on real datasets show that the method enables to find important genes already highlighted by approaches solely based on gene variations, but also to identify new groups of genes of high interest. In addition, AMINE method can be used as a web service on your own data (http://amine.i3s.unice.fr).

Publication
bioRxiv