Skip to main content

Exome or Whole Genome Sequencing?

This is one of the most discussed topics especially in Clinical Genomics! Affordability, accuracy, feasibility and of course time consumption - based on these factor mostly, which sequencing technology is more suitable for clinics? Whole Exome Sequencing or Whole Genome Sequencing? (WGS or WES, WGS vs WES) So here's my 2 cents on this discussion! When it comes to DNA sequencing there has always been a raging debate over the choice of Whole Genome Sequencing (WGS) or Whole Exome Sequencing (WXS) for routine use. Whole genome sequencing (WGS), as the name suggests is the process of obtaining the entire genome. In most cases however, this is far from practical and only 95-97% of the genome is covered because it is technically difficult to sequence certain regions of the genome (high GC content, large repeat regions, centromeres, telomeres, etc.) with existing technology. “It’s very fair to say the human genome was never fully sequenced,”  - Craig Venter “The human genome ha...

Exome or Whole Genome Sequencing?

This is one of the most discussed topics especially in Clinical Genomics!

Affordability, accuracy, feasibility and of course time consumption - based on these factor mostly, which sequencing technology is more suitable for clinics? Whole Exome Sequencing or Whole Genome Sequencing? (WGS or WES, WGS vs WES)
So here's my 2 cents on this discussion!

When it comes to DNA sequencing there has always been a raging debate over the choice of Whole Genome Sequencing (WGS) or Whole Exome Sequencing (WXS) for routine use. Whole genome sequencing (WGS), as the name suggests is the process of obtaining the entire genome. In most cases however, this is far from practical and only 95-97% of the genome is covered because it is technically difficult to sequence certain regions of the genome (high GC content, large repeat regions, centromeres, telomeres, etc.) with existing technology.
“It’s very fair to say the human genome was never fully sequenced,” - Craig Venter
“The human genome has not been completely sequenced and neither has any other mammalian genome as far as I’m aware” - George Church, Harvard Medical School
According to Dr. George Church of the Harvard Medical School, about 4-9% of the human genome is yet to be sequenced! (Read more about the incomplete nature of the Human Genome Project in this STAT news article)
Exome sequencing, sometimes called ‘whole exome sequencing’ (WES or WXS), instead focuses on just the protein coding sequences. In the human genome that is slightly more than 3 billion bases, only a fraction (roughly 1.5%) is coding (called the Exome) and the remaining consists of repeating DNA sequence that does not code for protein, and has no known function (for the sake of this discussion).
Exome is like the juicy succulent part of an orange and the rind along with seeds forms the junk DNA (well, not a proportionate analogy but a functionally apt one!)
Here is a comparison between these two technologies!
Who wouldn't love to have the whole shebang of all data - exons, introns, promoters, genomic and UTR regulatory regions, noncoding RNAs and structurally variant segments, right?! Think about the decisions that could be facilitated by all that data especially in a clinical situation! But in the current climate, making Clinical WGS feasible and efficient remains a huge practical challenge especially with respect to data storage and compute needs. This is only more complicated with the data privacy, sharing and storage security concerns that surround patient data or other clinically sensible data in such large volumes.
Whole Exome sequencing on the other hand makes the task less daunting. In as little as 1.5% of the total genome's size, WXS can find as many as 23,500 genes with 180,000 exons (1). So overall, less sequencing is required compared to WGS and hence lower costs. This narrow stretch of sequence (Exome) contains approximately 80-90% of all disease causing variants making it a data-rich ground for all analysis saving a lot of time. Comparing clinical grade sequencing a 100x depth exome with that of a 30x whole genome, the file size is anything between 5GB to 8GB for a 100x exome, very clearly less than ~90GB in case of 30x whole genome.
Exome sequencing produces less raw data reducing the overall cost of the project and enabling larger number of samples to be studied simultaneously. This makes identification of novel disease causing variants easier with exome sequencing. This also means that there are more annotated and interpreted data points corresponding to exome regions compared to the non-coding regions. This also means on the contrary that important variations outside the exonic regions will be missed.
Another important thing to keep in mind would be the targeting efficiency of currently available Exome capture kits. While the known number of putative human genes is somewhere between 22,000 and 23,500 approximately (http://www.ncbi.nlm.nih.gov/projects/CCDS/), most of the commercial exome capture kits cover only the CCDS set. This means only around 18,000 genes would be covered from the total set. Adding to the pain, not all of the genes in the CCDS are covered fully. Some exomic regions are excluded during the manufacturing of kits due to the hybridization chemistry and possibility of cross-hybridizations complicated further by segmental duplications and some conserved pseudogenes. Massive deep sequencing of the transcriptome also contributes to this growing complexity by finding novel transcripts and genes in less characterized CCDS and genes.
When we make recommendations for choosing the accurate sequencing technology, quality parameters play an important role. It has been noted the Whole Genome Sequencing (WGS) offers a uniform distribution of sequencing quality parameters compared to WXS. PCR enrichment in exome sequencing produces regions of non-uniform coverage - regions with too much or too less coverage. The heterogeneous coverage of Whole Exome and resultant false positive variants are directly related to PCR and this is a criminal waste of sequencing power especially coupled with a bad capture kit. Average error rate by cycle is two times more in PCR-based technologies (WXS) compared to PCR-free technologies (WGS).
There are also occasions where several potentially damaging coding SNVs have been missed by WXS despite being located in the regions targeted by the exome kit. With respect to Copy Number Variations (CNV), it is needless to say WXS is a not a reliable technology due to the noncontiguous nature of the captured exons, in particular, and the extension of most CNVs beyond the regions covered by the exome kit (2). Whereas in the case of WGS, better identification of copy number variations, rearrangements and other structural variations is very much possible taking advantage of longer reads. Since WGS is aimed at the universal set, there is no reference bias whereas WES capture probes tend to preferentially enrich reference alleles at heterozygous sites producing false negative SNV calls.
Whole Exome Sequencing makes life easy by saving time and cost! But Whole Genome Sequencing gives more complete and reliable insights into genome-wide variation. WXS would be a diagnostic tool where time and money are of essence but approximately 75-80% of the cases still remain undiagnosed. In all those situations, WGS would be the way forward of course taking into account the cost of data analysis and storage.
Whole Genome Sequencing covers almost all regions of the genome! Who wouldn't love to have the whole shebang of all data!
In conclusion, there is no clear consensus as to which one is better of the two with strong proponents on both sides. However, with the WGS prices falling globally, WXS is losing its stand with respect to cost-effectiveness and with modern advancements in computing, the time for analyzing WGS datasets is also coming down. These are pointing towards a favourable future for WGS perhaps as long as users worldwide don’t swing back in favour of WXS, of course with some major modification.
1. Ipe, J., Swart, M., Burgess, K. and Skaar, T. (2017), High-Throughput Assays to Assess the Functional Impact of Genetic Variants: A Road Towards Genomic-Driven Medicine. Clinical And Translational Science. doi:10.1111/cts.12440
2. Belkadi, A., Bolze, A., Itan, Y., Cobat, A., Vincent, Q. B., Antipenko, A., Abel, L. (2014). Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. doi:10.1101/010363

WGS or Exome

Comments

Popular posts from this blog

How to upgrade R version on Ubuntu?

So we all have spent some quality time googling for an easy way to upgrade our R installation to the latest version on Ubuntu! Else why would you be here?! right?! Here are some quick steps to upgrade your R to latest version!  R-installations are tied up to their Ubuntu versions and hence are connected to the Ubuntu Distribution name like Trusty Tahr or Precise Pangolin. The trick here is to send a message to the repository with your latest Ubuntu distribution name, so it will pick up the latest version and update whatever is the new one out!  Just follow these steps, (simply copy-paste these commands in sequence, it will work, fingers crossed!)   1. Get the Ubuntu distribution name matching your current Version of ubuntu  distname=$(lsb_release -c -s)   2. Select the nearest CRAN mirror to you  echo "deb http://mirrors.ebi.ac.uk/CRAN/bin/linux/ubuntu $distname/" | sudo tee -a /etc/apt/sources.list > /dev/null  3. Authenticate do...

How do I get into Veterinary School?

What advice would you give to any aspiring candidate who is seeking to become a veterinarian? As we have seen in the recent past, Boromir tends to be pessimistic about most of the things we could do and say. But do not let Boromir discourage you. Here are some tips for you to prepare yourself to start a career as a Vet. Future Vet Website is an excellent place to start if you are looking for organised information. They advise you briefly about the necessary steps while you are in High School and as a College undergraduate. They also have lots of other interesting materials to read. Otherwise here are my simple tips -  Firstly ask yourself whether you are an animal lover - there's no point in being scared of animals, hate animals or indifferent about animals if you want to be their guardian angel for the rest of your life! Secondly be sure you want to be a vet for the right reason - to care, nurture, protect and save! and not for that Maserati you always wanted to...

Clinical Genomics - what is it?

What is Clinical Genomics? What's the difference between precision medicine and personalised medicine? Genomics is the study of an organism's entire genome (all the DNA for the discussion lets say)! Clinical Genomics is the study of: – the correlation between diseases and human genome – the interaction of medication and human genome – developing disease diagnostic methods from human genome - Finally - coming up with ways to treat that disease based on all this knowledge built with the help of genomics. When we think about genomics in treating a disease like cancer, we consider two aspects. First is the patient’s own genome, which is in every cell in their body. In their genome, they may have mutations that make them more susceptible to a particular type of cancer than someone who doesn’t have that particular type of mutation. For example, we know that individuals with specific inherited mutations in BRCA1 and BRCA2 genes are at a greater risk for bre...