It is mentioned that mystery is generally hidden, which is a problem that is easily overlooked. In the second generation of sequencing, the following two mysteries are usually encountered.
The temptation of low prices
The confusion of beautiful parameters
What enrichment technology or database construction scheme is used before sequencing, which product to use, the size of the capture area, and the depth of sequencing are all directly related to the price; and the mystery of cracking the beautiful parameters requires everyone to study the target area and itself. The purpose is to have a deeper understanding so as not to be confused by the appearance.
If you build a library and outsource it with sequencing;
If you are confused about similar capture panels from different manufacturers;
If you are hesitating because of price differences;
If you are obsessed with capturing efficiency and coverage;
Then you must look over.
Below we explore the mystery and bending of the second-generation sequencing from the targeted enrichment technology of the target region and the gene panel based on the capture technology. Below we explore the mystery and bending of the second-generation sequencing from the targeted enrichment technology of the target region and the gene panel based on the capture technology.
Target enrichment techniques fall into two broad categories, based on enrichment by PCR amplification and enrichment based on hybrid probe capture.
The enrichment based on PCR amplification is to amplify the target sequence from the total genomic sample by using multiple pairs of primers targeting the target sequence, which is characterized by low technical threshold, convenient primer customization, simple operation flow and low cost; Due to the limitations of primer design and optimization, the PCR-based method can enrich the target region, and if there is an unknown mutation in the primer region, the mutation will be completely missed, and the false positive caused by the PCR process cannot be excluded.
Target enrichment based on liquid phase hybridization probe capture is performed by using thousands to millions of oligonucleotide capture probes designed to complement the target region to capture the target region and then sequencing, which is characterized by probe design. Flexible, high throughput, large or small capture area, can detect fusion genes; the disadvantage is that the cost of probe synthesis is relatively high, and the process of building a library is long. At present, target enrichment methods based on hybrid probe capture are more mainstream in the market, especially for large panels, such as exomes, and only hybrid probe capture technology can achieve enrichment of these large regions. So let's talk about hybrid probe capture technology in detail.
Hybrid capture techniques fall into two broad categories based on probe types:
Agilent pioneered RNA hybrid capture probes with a uniform length of 120 mer RNA probes.
A DNA probe represented by NimbleGen, a DNA probe with a probe length of 50-105 mer for NimbleGen, a DNA probe for 95 mer for illumina TrueSeq, and a DNA probe for 120 mer for IDT.
Features of DNA probes:
The annealing efficiency and specificity of DNA and DNA are greatly affected by temperature. In order to ensure consistent hybridization efficiency, it is sometimes necessary to optimize the length of the probe.
The rigidity of the DNA probe is strong, and it is difficult to capture the corresponding region of the probe if there is a large fragment insertion or deletion.
The chemical synthesis cost of short DNA probes is low, but as the length of the fragments increases, the cost of chemical synthesis increases significantly.
Features of RNA probes:
120 mer long RNA probes do not self-anneal even in excess of the environment, enabling simultaneous large-scale sequence capture; longer probes can be used with more stringent cleaning conditions to minimize contamination of non-target sequences 
Kinetically, RNA-driven solution hybridization has an equivalent effect on either shortly dispersed fragments or long continuous regions 
RNA has a better affinity for DNA, and excess RNA probes can drive the reaction in the direction of hybridization and reduce the amount of library fragments; less GC/AT bias, better homogeneity 
The unique toughness of RNA, combined with the length of 120 mer, provides better "fault tolerance" while ensuring specificity, capturing both single-base mutations and insertion/deletion mutations (such as 25 bp deletions, as shown) 1), the target sequence is lost less, achieving higher sensitivity [1, 2]
Figure 1. The RNA probe has better toughness and can be captured when the allele has a 2 - 25 bp deletion.
Let's take a look at the hybrid capture probe and look at the hybrid capture-based gene library. The results of sequencing are often used to evaluate the pros and cons of a capture product. The parameters that people usually focus on are coverage, sequencing depth under the same sequencing amount, and target sequence ratio (or capture efficiency). Let's take a look at the explanations of these names, and then teach you how to identify these parameters by taking the exome products of several companies on the market as an example.
Coverage (% coverage): generally refers to the proportion of areas in the target area that are covered by the sequence >1x
Sequencing Depth: The ratio of the total number of bases sequenced to the base of the target sequence. It can also be understood as the average number of times a single base is sequenced on the tested genome.
Sequencing depth >20x coverage: proportion of the region with a sequencing depth >20x in the target region as a percentage of the total target area
Target on-target ratio or capture efficiency (% on-target reads): the ratio of the number of target region sequences to the total sequence determined
We all know that some regions of the exome have low GC content, high AT content, weak probe binding ability, and low capture efficiency. Some regions are difficult to design probes due to the presence of repeat sequences and secondary structures, even with probes. Coverage, capture efficiency is not high. Parameters such as coverage and capture efficiency are directly related to the design of the target. That is to say, if you want these parameters to look good, it is very simple to skip the areas where the capture is inefficient and difficult to design the probe, and the target area only includes those areas where the probe is easy to design and the capture efficiency is high and uniform. If someone only pays attention to the above parameters and only selects products based on these parameters, the final result may be very beautiful, but it bears the risk of losing a truly meaningful target, and this risk is no matter how to improve the sequencing depth. The way to make up for it, because those targets are not in the design of this exon!
Taking the whole exome as an example, the evaluation of its advantages and disadvantages should firstly take the design of the product as the starting point, objectively compare its coverage for major databases, and the unique meaning compared with other similar products. The number of targets. Here, I would like to remind you that some manufacturers sometimes go through "art processing" when they give the parameters of the captured product. For example, some manufacturers give the size of the capture area to be the area that is expected to be captured rather than the area actually covered by the probe , which results in a more honest manufacturer losing money when comparing the coverage of the database. But when we look at the captured data reflected by the actual sequencing results, those who have undergone "art processing" will not have the latter's beautiful results, so everyone should have more eyes. Of course, the reason why there will be art processing is that the major manufacturers are clear that design is the foundation of a product and the key to the product. If the foundation is not solid and imperfect, then no matter how hard it is, It is also unable to make up for its innate shortcomings.
The Agilent Clinical Research Exome V2 is a full exome product designed specifically for clinical research. It is based on the design of the Agilent All Exome V6 and adds Emory University and Philadelphia Children's Hospital. Selected new targets and enhanced coverage of disease-related genes are the most comprehensive medical research exomes on the market. Next, we compare the design differences with other similar products from its coverage of major databases.
Table 1. Comparison of coverage rates for selected annotation sources
Let's take a look at the difference in the number of unique and meaningful targets. Agilent's clinical research exome V2 not only contains more unique disease-related variants, but also contains more unique ClinVar pathogenic/possible pathogenic mutations, and these ClinVar pathogenic/possible pathogenic variations mean more A wide variety of diseases are covered by Agilent's products.
Table 2. Comparison of unique ClinVar variants
After reading the above comparison, the core competitiveness of a capture product and the value to the user can be seen at a glance. Of course, no product is perfect, and the more complete the site, the more difficult it is to design. Despite this, with a solid foundation in targeted capture, Agilent's most comprehensive medical research exome, the clinical research exome V2, still excels in coverage and capture efficiency (Figure 2 ).
Figure 2. Target coverage for 6.5Gb sequencing data
In addition to choosing the product itself, don't forget to check out the customization capabilities. The speed of database updates is very fast, and the speed at which manufacturers can launch stable and reliable commercial products cannot be kept up to date. Therefore, it is necessary to emphasize the importance of customization. Simply put, the speed at which manufacturers launch new products cannot keep up with the speed of database updates, but we can add new genes to the database based on the existing versions of the manufacturers. Agilent's SurePrint printing technology offers great customization capabilities, and its free online design tool, SureDesign, adds new genes to users' requirements based on existing catalog panels. As a result, the new experimental data is in good agreement with the original experimental data, and adds new content.
After the above brief introduction, how much can help you to see the mystery of the main part of the second-generation sequencing target enrichment before sequencing. I hope that you will be able to see the fire and see the autumn.
Contact: ChunMing Guo
Add: 武汉市汉阳大道恒大绿洲27栋3单元 27 3 units of Hengda oasis, Hanyang Avenue, Wuhan