Overview of Transcriptome Analysis
Amit U Sinha, Ph.D
Last Updated: Nov 7, 2019
Transcriptomics refers to the study of all the RNA transcripts in a cell, including mRNA. Using NGS technology, we can accurately quantify all the transcripts using RNA-Seq. The analysis of the transcriptome of any cell population provides a unique insight into the expression of genes within the system being studied, making transcriptome analysis an important technique with applications in stem cell research, cancer research, phylogenetic studies, biomarker discovery and even fertilization treatments.
Transcriptome analysis not only provides insights into the expression of specific genes, but also tells researchers about the level at which the genes were expressed. Moreover, transcriptome data analysis provides us with information that may not be possible to identify at the genomic or proteomic level. While researchers are usually more interested in genomic changes or variations that reflect in the transcriptome, important changes in gene expression do not always correlate with any genomic changes. Similarly, there is usually only a moderate to weak correlation between changes in the transcriptome and proteome [1].
Origins of Transcriptome Analysis
A variety of transcriptome analysis methods have been developed and used to date. The first attempt at identifying the human transcriptome occurred in 1991 using cDNA sequencing techniques [2]. Later, Serial Analysis of Gene Expression (SAGE) using Sanger sequencing of concatenated random transcript fragments was developed. Techniques like qPCR and RT-PCR were also used for transcriptome analysis. However, these methods were overtaken first by high throughput methods like microarrays and then by RNA-Seq. A comparative assessment of these techniques shows efficacy and limitations of each:
Microarray | qPCR | RNA-Seq | |
Throughput | High | Low | High |
---|---|---|---|
Input RNA Amount | High | Low | Low |
Labor Intensity | Low | Low | High |
Prior Knowledge | Needed for probes | Needed for probes | None |
Quantitation accuracy | >90% | Not available | ~90% |
Sequence Resolution | Limited by probe design and cross-hybridisation | Limited by probe design | Limited by sequencing accuracy of ~99% |
Sensitivity | 10−3 (limited by fluorescence detection) | High (count not available) | 10−6 (limited by sequence coverage) |
Technical Reproducibility | >99% | >99% | >99% |
Table 1. Comparison of transcriptome analysis methods, adapted from Lowe et. al., 2017.
Of these, the RNA-Seq method has been the most popular in recent years, owing to parameters like low input RNA quantity, accuracy, and sensitivity. The RNA-seq method extracts DNA from the organism or cell sample, then copies the fragmented RNA into stable double-stranded cDNA. This cDNA is then sequenced and mapped to a reference genome to identify which genes were being actively transcribed. This data can further be used to detect novel alternative splice variants, generate a expression signature of the sample, and many other applications.
Multiple RNA-seq platforms are available that can be used for whole transcriptome analysis or targeted analysis of particular gene panels. The platforms available include Illumina, Ion Torrent and Pac Bio, listed in ascending chronology of commercial release. Illumina is by far the most used platform of the available three [3].
The raw sequence data obtained from an RNA-seq experiment has to be subjected to step-wise processing before genes and expression levels are annotated. As in a typical NGS workflow, the data processing steps includes quality control, alignment, quantification & differential gene expression. Basepair offers a number of pipelines for transcriptome analysis, from differential expression using DESeq, to de-novo assembly using Trinity, to gene fusion finding using deFuse, etc. An overview of our RNA-seq pipelines can be found on our RNA-seq analysis page.
Applications of transcriptomic data in life science research, especially health care and disease management, are huge. Below is just a small sampling of high impact applications that highlight the necessity and need for transcriptome analysis.
Diagnostics and Disease Profiling – Transcriptome Analysis methods like RNA-Seq have helped researchers identify disease-associated SNPs, allele-specific expression and gene fusions, which in turn has helped in diagnostics and profiling of variant-caused diseases.
Host/Pathogen Transcriptome Studies – Use of dual RNA-Seq methods to simultaneously study the host and pathogen transcriptome and gene expression has helped researchers study the interactions between the host and pathogens as well as presence of inter-species gene regulatory networks.
Environmental Studies – Transcriptomics has enabled researchers to study the effect of environmental factors on gene expression and measure the response of organisms, especially plants, to biotic/abiotic stress. For example, comparative analysis of a range of chickpea lines identified distinct transcriptional profiles associated with drought and salinity stresses [4].
Gene function annotation – A major contribution of transcriptomics has been to enable the research community to identify functions of genes that were previously unknown.
With emerging techniques within transcriptome analysis and the vast applications of the method in research, transcriptomics will play a crucial role in furthering the understanding of gene functions and will help us reveal the nuances of gene expression, enabling better healthcare and human development.
References
1. Stare, T., Stare, K., Weckwerth, W., Wienkoop, S., & Gruden, K. (2017). Comparison between Proteome and Transcriptome Response in Potato (Solanum tuberosum L.) Leaves Following Potato Virus Y (PVY) Infection. Proteomes.
2. Adams, M. D., Kelly, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. J., Xiao, H., . . . Moreno, R. F. (1991). Complementary DNA sequencing: expressed sequence tags and human genome project. Science.
3. Lowe, R., Shirley, N., Bleackley, M., Dolan, S., & Shafee, T. (2017). Transcriptomics technologies. PLoS Computational Biology.
4. Garg, R., Shankar, R., Thakkar, B., Kudapa, H., Krishnamurthy, L., Mantri, N., . . . Jain, M. (2016). Transcriptome analyses reveal genotype- and developmental stage-specific molecular responses to drought and salinity stresses in chickpea. Scientific Reports