Another Bioinformatics website

Tag: sequencing

Amplicon sequencing and high-throughput genotyping – HLA typing

In the previous post I explained the fundamentals about the Amplicon Sequencing  (AS) technique, today I will show some current and future applications in HLA-typing.

Other field of use of AS is the genotyping of complex gene families. For example, the major histocompatibility complex (MHC). This gene family is known to be highly polymorphic (high allele variation) and to have multiple copies of an ancestor gene (paralogues). MHC genes of class I and II codify the cellular receptors that present antigens to immune cells. MHC in humans is also called human leukocyte antigen (HLA). HLA-typing has a key role in the compatibility upon any tissue transplantation and has been associated with more than 100 different diseases (primarily autoimmune diseases) and recently is associated to various drug positive and negative responses. HLA loci are so polymorphic that there are not 2 individuals in a non-endogamic population with the same set of alleles (except twins).

Number of HLA alleles known up to date. Source: IMGT-HLA database

As in personalized medicine and metagenomics/metabarcoding, there are 2 approaches for NGS HLA-typing: the first is to use the whole genomic, exomic or transcriptome data and the second is to amplify specific HLA loci regions by amplicon sequencing. Second approach is suitable for typing hundreds/thousands of individuals but requires tested primers for multiplex PCR of HLA regions.

Basically the HLA-typing analysis workflow after sequencing the PCR products, consists in:

  1. Map/align the reads against HLA allele reference sequences from the IMGT-HLA public database.
  2. Retrieve the genotypes from the references with longer and better mapping scores.

Inoue et al. wrote a complete review about the topic in ‘The impact of next-generation sequencing technologies on HLA research‘.

HLA-typing workflow. Modified from Inoue et al.

Nowadays there are commercial kits that allow reliable, fast and economic HLA-typing: Illumina TruSight HLA v2, Omixon Holotype HLA, GenDx NGSgo or One Lambda NXType NGS.

Amplicon sequencing and high-throughput genotyping – Metagenomics

In the previous post I explained the fundamentals about the Amplicon Sequencing  (AS) technique, today I will show some current and future applications in Metagenomics and Metabarcoding.

Metagenomics (also referred to as ‘environmental’ or ‘community’ genomics) is the study of genetic material recovered directly from environmental samples. This discipline applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. Usually we use the term metabarcoding when we apply the amplicon sequencing approach in metagenomics studies, also metagenomics term if preferred when we study full genomes, not only few gene regions.

Metabarcoding workflow. Source: http://www.naturemetrics.co.uk

For metabarcoding, 16S rRNA gene is the most common universal DNA barcode (marker) used to identify with great accuracy species from across the Tree of Life, but other genes as: cytochrome c oxidase subunit 1 (CO1), rRNA (16S/18S/28S), plant specific ones (rbcL, matK, and trnH-psbA) and gene regions as: internal transcribed spacers (ITSs) (Kress et al. 2014; Joly et al. 2014). The previous genes have mutation rates fast enough to discriminate close species and at the same time they are stable enough to identify individuals of the same specie.

Prokaryotic and eukaryotic rRNA operons

A perfect metagenomics barcode/marker should…

  • be present in all the organisms, in all the cells
  • have variable sequence among different species
  • be conserved among individuals of the same species
  • be easy to amplify and not too long for sequencing

Recommended DNA barcodes for metagenomics

The pioneer metabarcoding study of Sogin et al. 2006 to decipher the microbial diversity in the deep sea used as barcode the V6 hypervariable region of the rRNA gene. Sogin et al. sequenced around 118,000 PCR amplicons from environmental DNA preparations and unveiled thousand of new species not known before.

Observe that in metabarcoding we cannot use DNA tags to pick out single individuals, but to identify different samples (of water, soil, air…).

Amplicon sequencing and high-throughput genotyping – Basics

Amplicon sequencing  (AS) technique consists in sequencing the products from multiple PCRs (amplicons). Where a single amplicon is the set of DNA sequences obtained in each individual PCR.

Before the arrival of high-throughput sequencing technologies, PCR products were Sanger sequenced individually. But Sanger sequencing is only able to resolve one DNA sequence (allele) per sample, or as maximum a mix of 2 sequences differing only in one nucleotide (usually heterozygous alleles):

Sanger sequencing limitations

The solution to the previous limitation was bacterial cloning and further isolation, amplification and sequencing of individual clones. Nevertheless, bacterial cloning is a time-consuming approach that is only feasible with few dozens of sequences.

Bacterial cloning to Sanger sequence clones individually

 

Fortunately, high-throughput sequencing techniques (HTS) also called next-generation sequencing (NGS) are able to sequence millions of sequences with individual resolution. And the combination of amplicon sequencing with NGS allows us to genotype hundreds/thousands of samples in a single experiment.

The only requirement to carry out such kind of experiment is to include different DNA tags to identify the individuals/samples in the experiment. A DNA tag is a short and unique sequence of nucleotides (ex. ACGGTA) that is attached to the 5′ end of any PCR primer or ligated after individual PCR. Each tag should be different for each sample/individual to analyze to be able to classify the sequences or reads from each individual PCR (amplicon) (Binladen et al. 2007; Meyer et al. 2007).

High-throughput amplicon sequencing schema

Again, NGS has other intrinsic problems: sequenced fragments are shorter than in Sanger and sequencing errors are more frequent. Random sequencing errors can be corrected by increasing the depth/coverage: reading more times the same sequence. And longer sequences can be split in shorter fragments and assembled together later by computer.

In the following link you can watch a video explaining the amplicon sequencing process using NGS:
http://www.jove.com/video/51709/next-generation-sequencing-of-16s-ribosomal-rna-gene-amplicons

 

Basically there are 4 basic steps in an NGS Amplicon Sequencing analysis:

  1. Experimental design of the primer sequences to amplify the desired gene regions (markers) and of the DNA tags to use to identify samples or individuals.
  2. PCR amplification of the markers, usually an individual PCR per sample.
  3. NGS sequencing of the amplification products. The most commonly used techniques are: Illumina, Ion Torrent and 454, also Pac Bio is increasing in importance as its error rate decreases.
  4. Bioinformatic analysis of the sequencing data. The analysis should consists in: classification of reads into amplicons, sequencing error correction, filtering of spurious and contaminant reads, and final displaying of results in a human readable way, ex. an Excel sheet.

Amplicon Sequencing 4 steps workflow

© 2024 Sixth researcher

Theme by Anders NorenUp ↑