Target Counts
The target counts stage is the first processing stage for the DRAGEN CNV pipeline. This stage bins the alignments into intervals. The primary analysis format for CNV processing is the target counts file, which contains the feature signals that are extracted from the alignments to be used in downstream processing. The binning strategy, interval sizes, and their boundaries are controlled by the target counts generation options, and the normalization technique used.
When working with whole genome sequence data, the intervals are autogenerated from the reference hashtable. Only the primary contigs from the reference hashtable are considered for binning. You can specify additional contigs to bypass with the --cnv-skip-contig-list option.
With whole exome sequence data, the target BED file supplied with the --cnv-target-bed option is used to determine the intervals for analysis.
The target counts stage generates a .target.counts.gz file, which can be later used in place of any BAM or CRAM by specifying it with the --cnv-input option for the normalization stage. The .target.counts.gz file is an intermediate file for the DRAGEN CNV pipeline and should not be modified.
The .target.counts.gz file is a tab-delimited compressed text file with the following columns:
• | Contig identifier |
• | Start position |
• | End position |
• | Target interval name |
• | Count of alignments in this interval |
• | Count of improperly paired alignments in this interval |
An example of a *.target.counts.gz file is shown below.
contig start stop name SampleName improper_pairs
1 565480 565959 target-wgs-1-565480 7 6
1 566837 567182 target-wgs-1-566837 9 0
1 713984 714455 target-wgs-1-713984 34 4
1 721116 721593 target-wgs-1-721116 47 1
1 724219 724547 target-wgs-1-724219 24 21
1 725166 725544 target-wgs-1-725166 43 12
1 726381 726817 target-wgs-1-726381 47 14
1 753243 753655 target-wgs-1-753243 31 2
1 754322 754594 target-wgs-1-754322 27 0
1 754594 755052 target-wgs-1-754594 41 0