S3 and HTTP Streaming Input Examples
DRAGEN can process input files directly from an S3 bucket, or using HTTP presigned URLs, which is known as input streaming. The input files do not need to be downloaded to a local disk prior to being processed. Instead, the files are streamed over the network directly into the DRAGEN processor.
Streaming is supported for compressed FASTQ (*.fastq.gz) files. A future version of DRAGEN will also support streaming from BAM (*.bam) files. Input streaming can be used in all the configurations that use single-end FASTQs, paired-end FASTQs, and FASTQ lists. The following examples show some of the ways to use input streaming.
Streaming FASTQ Input using S3
dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 s3://s3-bucket-name/path/to/object_1.fastq.gz \
-2 s3://s3-bucket-name/path/to/object_2.fastq.gz \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streaming
Streaming FASTQ Input using HTTP
dragen -f
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 https://bucket-name.amazonaws.com/path/to/object_1.fastq.gz?querystring \
-2 https://bucket-name.amazonaws.com/path/to/object_2.fastq.gz$querystring \
--RGID object_ID \
-RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streaming
You need to have permission to access the remote files. If you have access to the file, then DRAGEN is capable of streaming the remote file. The S3 object requires AWS authentication and credentials. The authentication should already be set up on the instance you are running, for example, via IAM policies. An HTTP URL most likely has a query string attached to it, which provides the authentication credentials or necessary tokens to grant permission. The security method may be present in other parts of the URL, for example:
https://stagingdl.dnanex.us/security/string/sample_1.fastq.gz