Normalization and Call Detection Stage

The next step in the CNV pipeline when using a panel of normals is to perform the normalization and to make the calls. This step involves selecting a panel of normals, which is a list of target counts files to be used for reference-based median normalization.

You can run the analysis in other workflow combinations, keeping in mind that the CNV events are called for the reference samples used. Ideally the panel of normals samples follow library prep and sequencing workflows that are identical to the workflows of the case sample under analysis. For calling on sex chromosomes, it is recommended that you use sex matched references in the panel. Because the normalization is performed on a per-target basis against the panel of normals median, having sex matched references is critical to detecting copy number events on the sex chromosomes.

For optimal bias correction, a minimum of 50 samples is recommended as a panel. DRAGEN can run with a single-sample panel, but single-sample panels can result in artifactual calls in the test sample where the panel sample has copy number changes.

To generate a Panel of Normals (PON), create a plain text file in which each line in the file contains a path pointing to a target.counts.gz file generated from the target counts stage. Relative paths are supported provided the paths are relative to the current working directory. Absolute paths are recommended in case the workflow is used later or shared with other users.

The following is an example PON file, which uses a subset of the GC corrected files from the target counts stage.

/data/output_trio1/sample1.target.counts.gc-corrected.gz
/data/output_trio1/sample2.target.counts.gc-corrected.gz
/data/output_trio2/sample4.target.counts.gc-corrected.gz
/data/output_trio2/sample5.target.counts.gc-corrected.gz
/data/output_trio3/sample7.target.counts.gc-corrected.gz
/data/output_trio3/sample8.target.counts.gc-corrected.gz
....

Alternatively, the files to be used in the panel of normals can be specified with the --cnv-normals-file option. This option takes a single file name, and can be specified multiple times.

After you have created a PON file, you can run the caller by specifying your case sample with the --cnv-input option and the PON file with the --cnv-normals-list option. Because we recommend using the GC bias corrected counts from the previous stage, there is no need to run GC bias correction again. GC bias correction can be disabled by setting --cnv-enable-gcbias-correction to false. For example:

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--cnv-input <CASE_COUNTS> \
--cnv-normals-list <NORMALS> \
--cnv-enable-gcbias-correction false

This command normalizes the case sample against the panel of normals.