Computational Techniques for Tumor Evolution and Heterogeneity

Cenk Sahinalp

Advances in single-cell sequencing (scseq) technologies uncovered an unexpected complexity in tumors, underlining the relevance of intratumor heterogeneity to cancer progression and therapeutic resistance. Heterogeneity in the mutational composition of cancer cells is a result of distinct (sub)clonal expansions, each with a distinct metastatic potential and resistance to specific treatments. Unfortunately, due to their low read coverage per cell, scseq datasets are too sparse and noisy to be used for detecting mutations in single cells. Additionally, the large number of cells and mutations present in typical scseq datasets are too large for principled computational tools to, e.g., infer distinct subclones, lineages or trajectories in a tumor. Specifically, available computational approaches for tumor phylogeny reconstruction typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix – which represents genotype calls of single cells. This problem is NP-hard, and as a result, these approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. Even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow.

In these lectures we will first introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We will also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny, demonstrating that data driven approaches can reconstruct key features of tumor evolution. We will then describe a novel algorithmic toolkit for scalable mutational intratumor heterogeneity inference and assessment from various scseq datasets. The new toolkit allows reliable identification of distinct clonal lineages of a tumor, offering the ability to focus on the most important subclones and the genomic alterations that are associated with tumor proliferation. We have comprehensively assessed our toolkit on a melanoma model by comparing distinct lineages and subclones it identifies on single cell RNAseq (scRNAseq) data, to those inferred using matching bulk whole exome (bWES) and transcriptome (bWTS) sequencing data from clonal sublines derived from single cells. Our results demonstrate that distinct lineages and subclones of a tumor can be reliably inferred and evaluated based on mutation calls from scRNAseq data through the use of our toolkit. Additionally, they reveal a strong correlation between aggressiveness and mutational composition, both across the inferred subclones, and among human melanomas. We have additionally applied our tookit to infer and evaluate distinct subclonal expansion patterns of the same mouse melanoma model after treatment with immune checkpoint blockade (ICB). After integratively analyzing our cell-specific mutation calls with their expression profiles, we observed that each subclone with a distinct set of novel somatic mutations is strongly associated with a specific developmental status. Moreover, each subclone had developed a unique ICB-resistance mechanism.