Molecular Subtyping and Cancer
Cancer Molecular Subtypes
Molecular subtyping refers to the use of -omics data to find clusters of tumors within a cancer type that have shared characteristics. Whereas earlier attempts to find subgroups of molecularly distinct tumors often looked at just a handful of genomic alterations, molecular subtyping takes into account a much wider range of input data, including patterns of gene expression as well as genotype and epigenetic regulation. For example, breast cancer has 4 molecular subtypes, each with its own biological signature.
- Luminal A, which is slow growing and has a higher cure rate than the other subtypes.
- Luminal B, which like Luminal A is driven by estrogen signalling, but is more aggressive.
- Her2 positive, which has a high potential for recurrence.
- Basal, which is fast growing and has a greater chance of metastasis.
BluePrint/MammaPrint and PAM50 are commercially available assays used to stratify breast cancer patients into these 4 subtypes. This stratification is intended to help doctors decide which treatments are most likely to work for their patients. For instance, Her2 positive tumors are more likely to respond to targeted therapies like trastuzumab or pertuzumab, and patients with Luminal A tumors may be successfully treated without chemotherapy or the terrible side effects that tend to accompany chemotherapy.
Other kinds of cancer — e.g. colorectal, lung, head and neck, bladder— have their own molecular subtypes, and since tumors falling into different subtypes often have different prognoses or responses to therapy (in aggregate if not always individually), knowledge of these subtypes may one day help guide treatment decisions for patients with these diseases as well.
How do researchers go about finding molecular subtypes?
Let’s look at an example from colorectal cancer. In this 2013 study, the researchers began with gene expression (transcriptomic) data from 566 colorectal tumors. The dataset was filtered so that only the most variable 1459 genes were kept. Those highly variable genes were then used as input for an unsupervised statistical learning technique called hierarchical clustering (warning: YouTube video!), which groups similar samples based on a distance metric. Samples separated by relatively short distances “belong together” and are placed in the same cluster. For this study, hierarchical clustering was run multiple times on subsets of the filtered dataset, and a technique called consensus clustering was used to aggregate the results of the individual runs into a single, hopefully more robust result. The output of this process revealed 6 molecular subtypes of colorectal cancer, each having a characteristic pattern of gene expression. The researchers then validated the subtypes by building a 57 gene centroid classifier and applying it to a validation set of tumor samples. They further refined these molecular subtypes by cataloguing genomic alterations and evaluating patient outcomes as a function of the subtypes.
The 6 subtypes described above are not the only possible colorectal subtypes though. Several other research groups arrived at their own molecular subtypes for colorectal cancer using broadly similar methods but different datasets, and unfortunately the results were not consistent. One group found 3 subtypes, another found 4. In 2014, the Colorectal Cancer Subtyping Consortium (CRCSC) tried to resolve these discrepancies by combining and harmonizing the results of 6 different studies. They built a network model of all 27 subtypes from all 6 studies and found 4 of what they called “consensus” subtypes. The CRCSC’s molecular subtypes are thought to be more reflective of the underlying biology of colorectal cancer than any of the individual studies that went into it.
Similar approaches have been adopted for other kinds of cancer: feed gene expression data from as many tumors as you can find into a clustering algorithm to define subtypes; characterize those subtypes for mutations, structural variations, and epigenetic features; then relate those subtypes to patient outcomes. As far as I can tell, the research teams studying these other cancer types have not yet taken the extra step of defining “consensus” subtypes like the CRCSC did though.
Can Molecular Subtyping do for other cancer types what it has done for breast cancer?
The jury is still out on this, though recently some doubts have surfaced. For head and neck cancer (HNSC), there was no relationship between the 4 HNSC subtypes (why is it always 4?) and recurrence free survival. The authors speculated that this may be due in part to the large degree of intertumor heterogeneity found in head and neck cancer. Similarly, stratifying by molecular subtype yielded equivocal or disappointing results for several colorectal cancer trials.
And most disturbing of all, a recent paper suggested that plain old tumor grading outperforms gene expression based molecular subtypes in muscle invasive bladder cancer.
This is not to suggest that molecular subtyping is worthless. The FDA, for one, believes it has clinical value for breast cancer at least. But, like everything else in cancer research, approaches that work well for one indication may fail for another. Heterogeneity between and within tumors, as well as the influence of the tumor microenvironment presents significant challenges. Many tumors simply do not fit neatly into single subtype, and so molecular subtypes are not likely to be the final word on patient stratification or predictive biomarkers for cancer treatment.
Where do we go from here?
There is still a lot of work to do. At Cofactor, we believe in the power of RNA and transcriptomic data. Importantly, the next evolution of molecular subtyping will leverage advanced data analysis, using tools such as machine learning. With these tools, we have the potential to identify new models of tumor response that move beyond the challenges of traditional statistical methods and molecular subtypes, making a bigger impact for patients across oncology indications, and beyond.