Mutational Processes in Human Cancer

APOBEC mutagenesis in non-canonical DNA structures
Graphical abstract: APOBEC mutagenesis in non-canonical DNA structures. iScience 2022.

A primary research focus of the lab is to investigate the nature of human cancer by deciphering mutational patterns caused by different mutational processes. Recent advances in DNA sequencing allowed us to obtain nucleotide sequences of whole cancer genomes for different cancer types and create a comprehensive map of mutational signatures. The generation of such mutagenesis maps allows us to examine a whole spectrum of somatic mutations, understand the underlying mutational processes, and contribute to the search of cancer-associated genes.

A major focus is on the APOBEC family enzymes — an important component of the human innate immune system that contributes strongly to the mutational load in cancer. Using motif-centered analysis of whole-genome mutation catalogs, we revealed previously unknown epigenomic features of APOBEC mutagenesis and have extended this approach to other kinds of mutagenesis in human cancers.

Selected publications

Host–Virus Interactions

RNA secondary structure elements in viral genomes
RNA secondary structure elements in RNA viruses. Sci Rep. 2024.

A growing area of our research focuses on host–virus interactions, combining transcriptomic and genomic analyses with structural and sequence-based computational approaches. Our recent studies are centered on mpox virus (MPXV), where we investigate how the human innate immune system — in particular APOBEC3 enzymes — shapes the viral genome and transcriptome. We also develop computational tools for the systematic analysis of RNA secondary structure in viral genomes.

Selected publications

Proteases in Cancer: Bioinformatics Methods for the Identification of Protease Substrates

Structural susceptibility of proteins to proteolytic processing
Structural features and susceptibility scores for proteolytic processing. Int. J. Mol. Sci. 2023.

Another field of research devoted to cancer and protein–peptide interactions is the investigation of mechanisms of recognition between proteases and their substrates and its implications for the development of bioinformatics methods for the prediction of protease substrates. Different types of cancer show increased activity of various proteases and decreased activity of their natural inhibitors. Identification and studying these altered interactions provides a handle for the design of agents for blocking tumor invasion and metastasis. Development of bioinformatics methods for predicting protease substrates can substantially reduce the amount of required experimental efforts.

Selected publications

Genomics of Commensal and Pathogenic Bacteria

Integrative genomic reconstruction of carbohydrate utilization in bifidobacteria
Phylogeny and effector specificities of ROK-family transcriptional regulators in Thermotogae. Nucleic Acids Res. 2013.

The lab applies computational methods to study the genomics of bacteria comprising the human microbiome and multi-drug resistant pathogenic bacteria. In several projects devoted to these topics in collaboration with Prof. Andrei Osterman and Prof. Dmitry Rodionov (Sanford-Burnham Medical Research Institute, USA), we studied bacterial metabolism by reconstructing the regulatory networks. In projects in collaboration with the research groups from Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology, and Immunology, we performed genomic analysis of the antimicrobial resistance and virulence determinants of clinical isolates from immunocompromised pediatric patients.

Selected publications

Machine Learning Applied to Biological Sequence Analysis

A recurring theme across several projects is the application of machine learning methods to predict biologically relevant properties directly from protein sequences and structures. Working with large experimental datasets of protease cleavage events, we developed supervised learning models that capture structural and physico-chemical determinants of limited proteolysis and of structural susceptibility to proteolytic processing. These studies demonstrate how statistical learning can extract generalizable rules from high-throughput biochemical data and translate them into predictive tools applicable to genome-scale sequence analysis.

Selected publications