De Novo Calling Stage
A de novo event is defined as the existence of a genotype at a particular locus in a proband’s genome that did not result from standard Mendelian inheritance from the parents. The de novo calling stage identifies putative de novo events in the proband of each trio of a multisample analysis. In some cases, these putative de novo events may be real, but they can also arise from sequencing or analysis artifacts. Consequently, a de novo quality score is assigned to each putative de novo event and used to filter out low-quality de novo events. Trios are specified by specifying a .ped file with the --pedigree-file option. Multiple trios can be specified (eg, quad analysis), and all valid trios will be processed.
For each joint segment in a trio, the de novo caller determines if there is a Mendelian inheritance conflict for the called copy number genotypes. The CNV caller does not identify the copy number for each allele of a given diploid segment, which means assumptions are made about the possible allelic composition of the parent genotypes.
The assumption is that the copy number 0 allele is not present for diploid regions of a parent's genome (sex dependent) when the assigned copy number is 2 or greater. This results in simplifications, as follows:
Parent Copy Number Genotype |
Possible Copy Number Alleles |
Assumed Possible Copy Number Alleles |
---|---|---|
2 |
0/2, 1/1 |
1/1 |
3 |
0/3, 1/2 |
1/2 |
4 |
0/4, 1/3, 2/2 |
1/3, 2/2 |
N |
x/(N-x) for x <= N/2 |
x/(N-x) for 1 <= x <= N/2 |
The following are examples of consistent and inconsistent copy number genotypes for diploid regions using these assumptions:
Mother Copy Number |
Father Copy Number |
Proband Copy Number |
Mendelian Consistent? |
---|---|---|---|
2 |
2 |
2 |
Yes |
2 |
2 |
1 |
No |
3 |
2 |
4 |
No |
3 |
2 |
2 |
Yes |
2 |
0 |
2 |
No |
If a joint segment has a Mendelian inheritance conflict, a Phred-scaled de novo quality score (DQ field in the VCF) is calculated using the likelihoods for each copy number state (see Quality Scoring section) of each sample in the trio, combined with a prior for the trio genotypes:
DQ = -10log(1-Sum over conflicting genotypes(p(CNm|data)*p(CNf|data)*p(CNp|data)*p(CNm,CNf,CNp))/Sum over all genotypes(p(CNm|data)*p(CNf|data)*p(CNp|data)*p(CNm,CNf,CNp)))
Where
• | CNm = Mother copy number |
• | CNf = Father copy number |
• | CNp = Proband copy number |
• | p(CNm,CNf,CNp) = the prior for the trio genotype |
The DN field in the VCF is used to indicate the de novo status for each segment. Possible values are:
• | Inherited - the called trio genotype is consistent with Mendelian inheritance |
• | LowDQ - the called trio genotype is inconsistent with Mendelian inheritance and DQ is less than the de novo quality threshold (default 0.1) |
• | DeNovo - the called trio genotype is inconsistent with Mendelian inheritance and DQ is greater than or equal to the de novo quality threshold (default 0.1) |