

UMIs in combination with deep sequencing yielding multiple reads for each of the sample DNA fragments solved this problem. UMI-less data can’t distinguish between these and sequencing errors. These low error rates nevertheless interfere with the confident identification of low abundance variants. – Rare variant analysis: Illumina sequencing provides data with low error rates (~0.1 to 0.5%) for most applications. Please also see our FAQ: “Should I remove PCR duplicates from my RNA-seq data?” for more information. UMIs alleviate the PCR duplicate problem by adding unique molecular tags to the sequencing library molecules before amplification. In the latter case alignment coordinate-based de-duplification will remove large numbers of biological duplicate reads from the data, especially for the most abundant transcripts.

Removal of PCR duplicates using alignment coordinate information is especially inefficient such for low input situations but also for deep sequencing data. These issues can potentially cause erroneous quantitation data. When starting from ultra-low input samples, stochastic effects in the first rounds of the PCR add to the problems. While the PCR polymerases and reagents have been improved greatly in recent years enabling a mostly unbiased amplification of sequencing libraries, some biases still remain against sequences with extreme GC contents and against long fragments. Their preparation requires PCR amplification of the libraries. – Quantitative analysis: Many sequencing library preparation protocols enable high-throughput sequencing (HTS) from low amounts of starting material. UMI sequence information in conjunction with alignment coordinates enables grouping of sequencing data into read families representing individual sample DNA or RNA fragments. RNA-Seq, ChIP-Seq) and also for genomic variant detection, especially the detection of rare mutations. UMIs are valuable tools for both quantitative sequencing applications (e.g. The idea seems to have been first implemented in an iCLIP protocol ( König et al. UMIs are also known as “Molecular Barcodes” or “Random Barcodes”.

UMIs are complex indices added to sequencing libraries before any PCR amplification steps, enabling the accurate bioinformatic identification of PCR duplicates. UMI is an acronym for Unique Molecular Identifier.
