.. _makePairs:
makePairs
=========
What it does
------------
The snakePipes makePairs workflow allows users to process their HiC/uC data from raw fastq files to HiC matrices (in
an allele-specific manner). The workflow utilizes mapping by bwa, followed by analysis
using `pairtools `__ . The workflow follows the `example workflow described in the documentation of pairtools `__ ,
which explains each step in detail and would be useful for new users to have a look at.
Currently the output matrices are produced in the `.pairs `__ format.
.. image:: ../images/makePairs_pipeline.png
Input requirements and outputs
------------------------------
This pipeline requires paired-end reads fastq files as input in order to build allele-specific contact matrices.
The input fastq files will be trimmed (with fastp) and be mapped against a diploid reference genome (with bwa).
Prior to building the matrix, the pipeline generates two reference genomes (from a reference genome and a VCF file) that contains the information
on haplotypes. The Haplotypes are set using the `--strains` flag. The two reference genomes are then merged to yield one reference genome
(genome/diploid_genome.fa) which is indexed with `bwa` as the basis for mapping of paired-end reads. (Notice that this is different from the mono-allelic HiC workflow
which map reads individually in single-end mode and combines them into contact pairs afterwards.
The output of mapping step is used by `pairtools`` to construct different contact matrices for each sample (in pairs format)
Workflow configuration file
---------------------------
Default parameters from the provided config file can be altered by user. Below is
the config file description for the makePairs workflow :
.. parsed-literal::
pipeline: makePairs
outdir:
configFile:
clusterConfigFile:
local: False
maxJobs: 5
## directory with fastq files
indir:
## preconfigured target genomes (mm9,mm10,dm3,...) , see /path/to/snakemake_workflows/shared/organisms/
## Value can be also path to your own genome config file!
genome:
## FASTQ file extension (default: ".fastq.gz")
ext: '.fastq.gz'
## paired-end read name extension (default: ["_R1", "_R2"])
reads: ["_R1","_R2"]
## assume paired end reads
pairedEnd: True
## Number of reads to downsample from each FASTQ file
downsample:
## Options for trimming
trim: True
trimmer: fastp
trimmerOptions:
verbose: False
fastqc: True
UMIBarcode: False
bcPattern: "NNNNCCCCCCCCC"
UMIDedup: False
UMIDedupSep: "_"
UMIDedupOpts: "_"
plotFormat: png
bwBinSize: 1000
aligner: 'bwa'
alignerOptions: '-SPu -T0'
alignerThreads: 30
fromBAM: False
sampleSheet:
Structure of output directory
-----------------------------
In addition to the FASTQ module results (see :ref:`running_snakePipes`), the workflow produces the following outputs::
.
|-- bam
|-- FASTQ
|-- FastQC
|-- FastQC_trimmed
|-- FASTQ_fastp
|-- genome
|-- multiqc
|-- originalFASTQ
|-- pairs
|-- phase_stats
* **bam** folder contains the mapping results in BAM format. The files were obtained after running `bwa `__ in paired-end mode.
* **originalFASTQ** includes softlinks to the original FASTQ data
* **FASTQ** links to **originalFASTQ** if no further filters are specified
* **FASTQ_fastp**: trimmed FASTQ files output by fastp
* **FastQC** FASTQC report on FASTQ directory
* **genome** folder contains the diploid_genome.fa.gz that was constructed from 2 strain-specific genomes with rule diploid_genome. Chromosome sizes and indices (bwa) can also be found in this directory
* **multiqc** folder contains the final QC report generated with MultiQC (including fastqc, fastp, and pairtools modules)
.. note:: For the pairtools modules to work we used `MultiQC from open2c `__ as specified for the makePiars environment
* **pairs** folder contains the parsed, phased, sorted and deduplicated contact matrices generated by pairtools.
* **phase_stats** contains the 4 subsetted pairs files for each sample (unphased pairs, 2 different strains, trans pairs). QC statistics are also calculated and will be processed by MultiQC
Command line options
--------------------
.. argparse::
:func: parse_args
:filename: ../snakePipes/workflows/makePairs/makePairs.py
:prog: makePairs
:nodefault: