What are the biological processes that affect RNA expression?
The point of RNA-seq in a clinical setting is to quantify the expression of a gene (or genes) of interest, and then use that information to make decisions about diagnosis, prognosis, or treatment options. If the goal is simple, the process is not. Even before using RNA-seq data for diagnostic purposes, we have to understand the factors affecting gene expression in the first place, as well as biological processes in which gene expression profiles are dramatically altered.
Remembering the Central Dogma
The central dogma of biology states that DNA is transcribed to RNA, which is then translated to protein, which carries out its programmed function. Discoveries in the past decade have shown this to be an oversimplified model, but the basic principle is still relevant for thinking about gene expression. Changes in gene expression can be induced by external factors like environment (diet, smoking), internal signals such as stress (hypoxia, nutrient deprivation), inflammation and tissue repair, and even genetic material such as non-coding RNAs.
Genetic factors affecting gene expression
Many different types of DNA mutations lead to alterations in gene expression, the production of defective proteins, and compromised cellular function. In general, the human genome includes two copies of each gene encoded in the DNA. This rule is broken by copy number variants, where stretches of DNA – potentially encompassing an entire protein-coding sequence – may be duplicated or deleted. These copy number variants (CNV) are often associated with disease states because the encoded proteins are either present in abnormal quantities or defective in function. CNVs present a relatively simple explanation for changes in gene expression: more/less DNA copies to transcribe ultimately leads to more/less of the downstream protein. This can be quantified by looking at the level of RNA transcripts or protein.
Single nucleotide polymorphisms (SNP) are changes in an individual nucleotide that may or may not cause a problem. Pathogenic SNPs lead to single amino acid changes in the encode protein, which can cause structural (and therefore functional) defects. In other cases, a SNP can alter the stop codon, so transcription and translation continue past the end of the gene, again causing massive defects in the resulting protein.
Upstream of genes themselves are promoter sequences, which are the natural regulators of gene expression. The physical proximity of a promoter to its target gene, as well as mutations in promoter regions, play into the level of transcription from DNA to RNA. Physical structure, mutations and epigenetic modifications (see below) can all alter promoter function and lead to changes in the amount of RNA transcribed.
Epigenetics refers to cellular processes aside from changes in the actual DNA sequence that change gene expression. Environment and lifestyle play in to epigenetic regulation, although they are technically not themselves epigenetic factors. Instead, the physical structure of DNA, DNA methylation, changes in histones, and regulation by RNA fragments that don’t encode genes (non-coding RNA), are the four primary epigenetic processes.
Environment and lifestyle can push on any one of these four processes, leading to diseases such as cancer and obesity and, potentially, heritable changes in the genome. Studies have shown that everything from diet to severe stress can cause epigenetic effects, some of which may be passed down through multiple generations (https://www.biologicalpsychiatryjournal.com/article/S0006-3223(15)00652-6/abstract). In the case of environmental problems like radiation, DNA itself is mutated, eventually causing deficits in cell cycle regulation and programmed cell death.
Cellular and pathological processes demonstrating altered gene expression
Next, we look at several cellular events that lead to significant shifts in gene expression profiles.
Inflammation is induced by tissue damage or invasion of foreign material. Natoli et al describe inflammation as, “a complex response […] set in motion that is aimed at eliminating the danger signals and eventually restoring tissue and organism homeostasis.” However, as both this review and countless others note, inflammation can also cause problems. Chronic inflammation is known to lead to cancer, diabetes, cardiovascular disease, brain and nervous system disorders, and much more. Thus, it needs to be carefully regulated through vast networks of genes that help to manage the response at every moment from the moment a problem is detected (and even before) to the long-term restoration of homeostasis. The inflammatory response is dependent on rapid activation of gene expression, regulated through transcription factors such as the NF-kB complex. When the NF-kB pathway is activated (by any one of hundreds of stimuli), molecules within the cytosol are modified and moved into the nucleus. From there, these transcription factors bind to DNA and regulate expression of genes involved in responding to the problem. Thus, the gene expression profile of cells undergoing an inflammatory response is dramatically different than a cell at homeostasis. Gene expression also vary even between cells responding to different stimuli.
Epithelial-mesenchymal transition (EMT) is essentially the reverse of a process that occurs during normal tissue development. It takes place during tissue repair and wound healing, but is also a key step in oncogenesis. EMT is the result of signaling events that cause cells to release from their neighbors and lose polarity (i.e., they flatten out and eliminate distinct top/bottom regions). Additionally, the cytoskeleton, which is responsible for maintaining cellular structure and movement, rearranges to adjust the former and initiate the latter. During wound healing, EMT is how cells within a damaged tissue close gaps to repair the problem. In cancer, EMT releases cells from their tissue of origin or primary tumor and allows them to metastasize throughout the body. (for a full molecular breakdown of EMT, see this review). In order to execute this complex transition, the entire gene expression pattern of a cell must change. As a result, RNA expression can provide a high-resolution genetic signature of cells undergoing the transition. (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-2036-9). Indeed, regulation of both RNA expression (transcription) and proteins (post-translational modification) are critically important for EMT (https://www.ncbi.nlm.nih.gov/pubmed/25201109).
Migration is closely related to EMT, but also occurs in cells not undergoing EMT. Once a cell has made the transition or is by default a migratory cell, it is capable of moving to a new location within the tissue. Embryo development is dependent on massive cell migration to ensure proper formation of the many tissues and organs of a mature organism. Wound healing is another example, where rows of cells on either side of a cut move towards each other to close the gap. Cells may also migrate far greater distances, as is seen in tumor metastasis. In these instances, a tumor cell is released from the primary tumor during EMT and enters the bloodstream (intravasation). Eventually, the cell or cells stop, adhere to the vessel wall, and crawl through (extravasation) to take up residence in another tissue. Immune cells undergo the same processes of intravasation and extravasation to reach the site of damage or infection. Thus, migration can be local, with cells crawling across a relatively short distance, or distal. Gene expression profiles of migrating cells display significant changes from cells that are polarized and adhered to a substrate or neighboring cells. Cytoskeletal components must be reorganized to allow for movement, and genes regulating cell-cell attachment must be turned down.
Apoptosis is controlled, regulated cell death. In many ways it is one of the most important cellular processes in maintaining health and homeostasis, as it helps prevent damaged or defective cells from propagating (i.e., cancer). Apoptosis is induced by a wide variety of environmental and cellular stimuli, including everything from chemical trauma and radiation to infection and inflammation. There are also a number of slightly different apoptotic pathways, although they all converge on essentially the same set of key molecules and events. Thus, different apoptotic stimuli can result in different gene expression profiles. For example, in 2000 Brachat et al induced apoptosis via a physiological signal and a chemical signal. The researchers then used microarray to look at RNA levels in the two sets of cells and came away with 34 genes that were common to both pathways and made up a core set of apoptotic genes. Each set of cells, however, also displayed differential expression of hundreds of other genes that were linked to the specific type of stimulus. Dozens of different apoptotic gene expression profiles have been categorized, and are now offered as standardized kits that look at either RNA or protein levels.
Cell cycle regulation is another critically important process where significant shifts in gene expression are observed. Depending on the physiological context, cells may undergo division almost continuously (as during development), or may remain senescent for long periods of time and only divide at rare intervals (as is the case with many stem cells). Here, too, is an opportunity for problems to arise. Genes involved in shutting down cell division under adverse circumstances may no longer function, allowing defective cells to propagate and cause hyperplasia. Under normal conditions, problems with cell cycle regulation often result in apoptosis. If certain checkpoints are not met (such as proper replication of genetic material), the cell will pause until everything is back on track or undergo apoptosis. Because of the cyclical nature of this process, RNA levels of the various genes fluctuate at regular intervals and can be used as markers of the cell cycle. A remarkable use of this concept was demonstrated in 2015, when Kowalczyk et al used single-cell RNA-seq to look at the “average” status of whole populations of stem cells in parallel with the transcriptional status of individual stems cells. This study essentially represented a detailed examination of both the forest and the trees.
Cancers include deregulation of all of the above processes. As such, tumorigenesis provides a fascinating look into gene expression gone wrong. Additionally, oncology is one field where evaluating RNA levels is both well-established and extraordinarily powerful. In cancer, genetic mutations pile up to a point where previously healthy cells (or their severely mutated offspring) function outside normal regulatory bounds. Broadly speaking, this is known as “transformation,” which can include EMT and migration, uncontrolled proliferation, and changes in rates of apoptosis.
RNA and oncology will be discussed more in the next post.
Why look at RNA?
We have briefly mentioned several cellular processes representing shifts in gene expression. But, this process includes DNA, RNA and protein, so why focus in on RNA as a marker of gene expression in this series? There are a few reasons. First, as a company Cofactor is built on RNA-seq. Secondly, from a biological standpoint, RNA provides a unique level of insight into realtime events within a cell. Mutations in DNA can be extremely valuable for predicting potential problems (such as evaluating risk for breast/ovarian cancer from mutations in BRCA1/2). However, RNA can reveal the presence of non-genomic aberrations such as alternative transcripts, splice variants and gene fusions. Furthermore, RNA sequencing provides information about what is happening in a cell at the precise moment the genetic material was extracted. DNA, on the other hand, is relatively static and is more indicative of what could happen.
On the other end of things, problems with a protein are informative, as they reveal problems upstream in the process – whether that’s in the DNA sequence, transcription, translation, or post-translational modifications to the protein itself. Defective proteins have been found in many disease states, especially in neurological and brain disorders. However, RNA sequencing provides similar information with greater dynamic range and lower background than protein arrays, and is more accessible than mass spectrometry for protein detection.
Furthermore, as mentioned above RNA-sequencing can be used on single cells to look at cellular function and gene expression (https://www.sciencedirect.com/science/article/pii/S0952791516300474?np=y). It takes very little starting material to gain useful insight from RNA-seq, so the resolution is potentially much greater than looking at protein. Differential gene expression between cells that are activated or not, pathogenic or benign, can provide insight into the genetic underpinnings of functional changes that lead to human disease.
Looking to design an experiment but not sure where to start? Concerned about how many replicates will give you the most meaningful information? Contact our Project Scientist team to have a no-strings-attached scientific discussion today.