MEAST: A Novel Framework for Analyzing Gene Set Stability in Single Cell Transcriptomics

Abstract

Biological processes rely on the coordinated activity of multiple genes, forming biological pathways that regulate specific cellular functions. Gene set activity analysis tools are essential for understanding how groups of genes regulate these processes. However, accurately scoring the activity of the most effective genes within a gene set remains challenging. Here, we introduce MEAST (Modular Expression Analysis and Scoring Toolkit), a robust computational framework that combines truncated singular value decomposition for gene set scoring with a genetic algorithm to identify stable, high-contributing subsets of genes. We validated our approach using both simulated and real single-cell RNA-seq datasets. In real single-cell blood RNA-seq datasets, the activity scoring function accurately distinguished between cell subtypes. Moreover, when tested on bulk RNA data, MEAST successfully distinguished between lung adenocarcinoma and lung squamous cell carcinoma by accurately scoring marker gene activities. The main feature of MEAST is the stability assessment, which based on activity scoring and uses a genetic algorithm to identify the core active genes within a gene set. Our results showed that MEAST effectively eliminates all low-activity genes in single-cell RNA-seq data. This study presents an advanced framework to analyze scRNA-seq data in wide system biology applications including biomarker identification, disease subtype classification, drug response prediction, and pathway analysis.

Publication
Scientific Reports