Cookbook

Create Hybrid Genome with Spike-in Sequences (createIndices)

Generate a reference genome including both the native organism (referred to as host) and spike-in control sequences.

createIndices -o customIndices/GRch38_dm6 --tools bowtie2  --genomeURL organisms/GRCh38_ensembl/genome_fasta/genome.fa --gtfURL organisms/GRCh38_ensembl/gencode/release_31/genes.gtf --blacklist organisms/GRCh38_ensembl/akundaje/blacklist.UseMe.bed --spikeinGenomeURL organisms/dm6_ensembl/genome_fasta/genome.fa --spikeinGtfURL organisms/dm6_ensembl/ensembl/release-96/genes.gtf customIndices/GRch38_dm6/GRCh38_g31_dm6

Cut&Tag Data Analysis (DNAmapping and ChIPseq)

Process and analyze Cut&Tag data for chromatin binding. Fastq files are mapped to a host-spikein hybrid genome and spikein sequences are used for normalization of bam coverage tracks.

DNAmapping --cutntag --trim --trimmerOptions ' -a nexteraF=CTGTCTCTTATA -A nexteraR=CTGTCTCTTATA ' --fastqc --dedup --mapq 3 -i $input_folder -o analysis_dedup customIndices/GRch38_dm6/GRCh38_g31_dm6.yaml
ChIPseq -d analysis_ChIPseq --fromBAM analysis_dedup/filtered_bam --bamExt .filtered.bam --cutntag customIndices/GRch38_dm6/GRCh38_g31_dm6.yaml chip_seq_sample_config.yaml

Differential Binding on Target Regions (ChIPseq)

Identify regions showing significant changes in binding between conditions.

ChIPseq -d analysis_chipseq --fromBAM analysis_dna --bamExt .bam  --externalBed rmsk.bed  --sampleSheet sampleSheet.tsv mm10_gencodeM19 chip_seq_sample_config.yaml

Time-course Analysis of mRNAseq Data (mRNAseq)

Investigate temporal changes in transcript levels across multiple time points by leveraging an LRT test. For this implementation, the condition column in the sample sheet should contain groups corresponding to time points.

mRNAseq -i RNAseq -o analysis --sampleSheet sampleSheet.csv  --LRT mm10_gencodeM19

Differential Expression of Transcribed Repetitive Elements (ncRNAseq)

Assess changes in expression for transcribed repeats. The organism yaml must contain the key rmsk_file pointing to the repeat masker txt file.

ncRNAseq -i RNAseq -o analysis --sampleSheet sampleSheet.csv  mm10_gencodeM19

Hi-C Data Analysis with TAD Calling (HiC)

Analyze chromatin conformation and identify topologically associating domains (TADs).

HiC -i merged_fq -o analysis --fastqc --trim --enzyme DpnII --binSize 5000 mm10_gencodeM19

Allele-specific Hi-C Data Analysis (makePairs)

Distinguish allelic differences in chromatin interactions.

makePairs -i input-dir -o output-dir --VCFfile vcf --strains s1,s2 dm6

Differential Transcript Expression Analysis (mRNAseq)

Quantify and compare transcript expression between different conditions.

mRNAseq -i RNAseq -o analysis -m alignment-free --sampleSheet sampleSheet.csv mm10_gencodeM19

code @ github.