Cookbook
Create Hybrid Genome with Spike-in Sequences (createIndices)
Generate a reference genome including both the native organism (referred to as host) and spike-in control sequences.
createIndices -o customIndices/GRch38_dm6 --tools bowtie2 --genomeURL organisms/GRCh38_ensembl/genome_fasta/genome.fa --gtfURL organisms/GRCh38_ensembl/gencode/release_31/genes.gtf --blacklist organisms/GRCh38_ensembl/akundaje/blacklist.UseMe.bed --spikeinGenomeURL organisms/dm6_ensembl/genome_fasta/genome.fa --spikeinGtfURL organisms/dm6_ensembl/ensembl/release-96/genes.gtf customIndices/GRch38_dm6/GRCh38_g31_dm6
Cut&Tag Data Analysis (DNAmapping and ChIPseq)
Process and analyze Cut&Tag data for chromatin binding. Fastq files are mapped to a host-spikein hybrid genome and spikein sequences are used for normalization of bam coverage tracks.
DNAmapping --cutntag --trim --trimmerOptions ' -a nexteraF=CTGTCTCTTATA -A nexteraR=CTGTCTCTTATA ' --fastqc --dedup --mapq 3 -i $input_folder -o analysis_dedup customIndices/GRch38_dm6/GRCh38_g31_dm6.yaml
ChIPseq -d analysis_ChIPseq --fromBAM analysis_dedup/filtered_bam --bamExt .filtered.bam --cutntag customIndices/GRch38_dm6/GRCh38_g31_dm6.yaml chip_seq_sample_config.yaml
Differential Binding on Target Regions (ChIPseq)
Identify regions showing significant changes in binding between conditions.
ChIPseq -d analysis_chipseq --fromBAM analysis_dna --bamExt .bam --externalBed rmsk.bed --sampleSheet sampleSheet.tsv mm10_gencodeM19 chip_seq_sample_config.yaml
Time-course Analysis of mRNAseq Data (mRNAseq)
Investigate temporal changes in transcript levels across multiple time points by leveraging an LRT test. For this implementation, the condition column in the sample sheet should contain groups corresponding to time points.
mRNAseq -i RNAseq -o analysis --sampleSheet sampleSheet.csv --LRT mm10_gencodeM19
Differential Expression of Transcribed Repetitive Elements (ncRNAseq)
Assess changes in expression for transcribed repeats. The organism yaml must contain the key rmsk_file pointing to the repeat masker txt file.
ncRNAseq -i RNAseq -o analysis --sampleSheet sampleSheet.csv mm10_gencodeM19
Hi-C Data Analysis with TAD Calling (HiC)
Analyze chromatin conformation and identify topologically associating domains (TADs).
HiC -i merged_fq -o analysis --fastqc --trim --enzyme DpnII --binSize 5000 mm10_gencodeM19
Allele-specific Hi-C Data Analysis (makePairs)
Distinguish allelic differences in chromatin interactions.
makePairs -i input-dir -o output-dir --VCFfile vcf --strains s1,s2 dm6
Differential Transcript Expression Analysis (mRNAseq)
Quantify and compare transcript expression between different conditions.
mRNAseq -i RNAseq -o analysis -m alignment-free --sampleSheet sampleSheet.csv mm10_gencodeM19