Input Files

Alignment Reference

Use a standard DRAGEN RNA reference genome or hashtable for the DRAGENscRNA Pipeline. For example, build using --ht-build-rna-hashtable=true). The pipeline also requires a gene annotation file in GTF format, provided with the --annotation (-a) option.

Read Input

The DRAGEN scRNA Pipeline requires both the transcript sequence and the barcode+UMI sequence for each fragment (read) as input. The transcript sequence is aligned to the genome to determine the expressed gene, the barcode+UMI sequence is split into the single-cell barcode to identify the unique cell, and a single-cell UMI for unique molecule quantification.

When starting from FASTQ, you can either include the UMI in the read name or provide separate UMI FASTQ files.

UMI in Read Name

Provide the transcriptome reads as a single-end FASTQ file with the Barcode+UMI sequence in the eighth field of the read-name line. Separate sequences using a colon. The following example uses read2 (sample.R2.fastq.gz) as the transcript read.

@D00626:253:CAT5WANXX:4:1302:8433:85343:GAAACTCGTTCAGCGCTCATCTCGGC

ACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGTCACTGACCCCGAGAAGTTTGCATCCTGCACAGCTAGAGATC

CCCCBFGGGGGGGGGGGGGGGGGGBDCGGGE>1BGDFGG0F@FBGGGBCDGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGFGGGGFGEGGGGGE

In the example, the GAA sequence is the barcode+UMI read and the ACAG sequence is the transcriptome read.

These FASTQ files can be generated by bclConvert and bcl2fastq using the UMI settings to define the single-cell barcode+UMI read. If using bclConvert, enter the barcode/UMI information using the OverrideCycles setting. For more information, see the BCL Convert Software Guide (document # 1000000094725).

bclConvert refers to the entire single-cell barcode+UMI sequence as UMI.

Enter the following command line option to use the generated FASTQ files from bclConvert:

dragen -1 <file name> --umi-source=qname

The option is also compatible with the --fastq-list input options and with read input from BAM files.

Separate UMI FASTQ Files

An alternative option is to provide the transcript and barcode+UMI sequences as two separate FASTQ files. One file contains only the transcriptome reads and one contains the corresponding barcode-reads in the same order. This file is similar to how read-pairs are normally handled. If using separate UMI files, the sequencing system run setup and bclConvert are not aware of the UMI at all and treat it as normal read sequence by default.

Enter the following command line option to use the separate UMI FASTQ files:

dragen -1 <file name> --umi-fastq=<file name> --umi-source=fastq

To use this method with multiple FASTQ files, do as follows.

Enter the barcode+UMI FASTQ files as read1 in the fastq-list file, and then enter the transcriptome read FASTQ files matching the default fastq_list.csv generated by bclConvert as read2.

Enter the following command:

dragen --fastq-list fastq_list.csv --umi-source=read1

Using Multiple Libraries

The scRNA pipeline can process a single biological sample per DRAGEN run. To process multiple single-cell libraries together, split the single sample into multiple single-cell libraries with a unique set of cells in each DRAGEN keeps the cells (barcodes and UMIs) from each library separate and provides merged outputs across all. Read groups are used to specify the library for each FASTQ file using the RGLB attribute.