Loading...
 

Allele Specific Expression Analysis Using RNA-Seq

1. Data setup

1.1. Working directory

a. In your home directory, create a new folder called ‘asarptest’, using the following command;

mkdir asarptest

1.2. Input and Output

a. Link all the reference data and the input files from the asarp system folder to your ‘asarptest’ folder, using the following command;

cd asarptest
ln -s /usr/local/bioinf/ASARP/* .
ls

(make sure you can see all the asarp files and folder)

b. Create the following folder to store the results files;

mkdir preproc
mkdir preproc/R1
mkdir preproc/R2
mkdir preproc/merged

1.3. Test run

Run testR.pl, using the below command, to make sure you have the required packages to run the exercise

perl testR.pl

2. Removing PCR Duplicates

rmDup.pl -- Removing duplicates in a SAM file of a chromosome (Dr. JH Lee’s format), where the extra 12th attribute (mapped read blocks) are used to identify distinct reads (read pairs). Reads (read pairs) are considered as duplicates only if all of their mapped read blocks have the same coordinates. Only the read (pair) with the highest read quality will be kept.

a. Make sure you have all the input SAM files, using the following command;

ls demo/data/chr*.sam

 

b. Run rumDup.pl as below, with the input file name, output file name, is_paired_end and a file name to store the log

perl rmDup.pl demo/data/chr1.R1.sam preproc/R1/chr1.rmDup.sam 1 >> \
rmDup.R1.log


-- Parameters

is_paired_end 0: single-end; 1: paired-end

For paired-end reads, all reads should be paired up,

where pair-1 should be always followed by pair-2 in the next line.

input_sam_file should contain only 1 chromosome, and it should be in standard SAM format or Dr. Jae-Hyung Lee’s SAM format

---

c. Run the above command (#2b) for all the other sam files (use the “UP” arrow key to call the previous command and change the corresponding input, output, log file names), like below;

perl rmDup.pl demo/data/chr5.R1.sam preproc/R1/chr5.rmDup.sam 1 >> \ rmDup.R1.log
perl rmDup.pl demo/data/chr10.R1.sam preproc/R1/chr10.rmDup.sam 1 >> \ rmDup.R1.log
perl rmDup.pl demo/data/chr1.R2.sam preproc/R2/chr1.rmDup.sam 1 >> \ rmDup.R2.log
perl rmDup.pl demo/data/chr5.R2.sam preproc/R2/chr5.rmDup.sam 1 >> \ rmDup.R2.log
perl rmDup.pl demo/data/chr10.R2.sam preproc/R2/chr10.rmDup.sam 1 >> \ rmDup.R2.log

3. Merge SAM Replicate Files

mergeSam.pl -- Merging corresponding SAM files (e.g. all chr1 SAM files) from different independent replicates. Each replicate is indicated by its folder. The SAM files should have been subject to duplicate removal

a. Create the replicate folders list, using the below commands;

echo "preproc/R1" >> rep.demo.lst
echo "preproc/R2" >> rep.demo.lst

 

b. Run mergeSam.pl using the below command, specifying the replicate folder list, prefix of replicate file names, suffix of replicate file names, output directory and a file name for the log

perl mergeSam.pl rep.demo.lst chr rmDup.sam preproc/merged > \
preproc/mergeSam.log

 

-Parameter

folder_list a file containing a line-separated list of folders representing mutilple biological replicates

e.g. data/rep1

    data/rep2

    data/rep3

prefix the common chromosome file prefix in these folders

e.g. proc_chr for proc_chr1.rmdup.sam, proc_chr2.rmdup.sam, etc.

suffix the common file suffix in these folders

e.g. rmdup.sam for proc_chr1.rmdup.sam, proc_chr2.rmdup.sam, etc.

output_folder all the merged prefix*.suffix files will be put to that

specified folder (assume it exists)

4. Generate Allelic Reads

procReads.pl (processing SAM files to get SNV read counts)

4.1. SNV file

We are going to use the genome sequence SNV’s to generate the allelic reads. The dna snv file ‘dna.snv.demo.lst’  is available in the demo folder. Use the head command to look at the contents of this dna snv list;

head -n2 demo/dna.snv.demo.lst

4.2. SNV read counts

Generate the SNV read counts and the bedgraph file using the below command, specifying the chromosome number, merged sam input file, dna snv list file, output name for rna snv file, output name for the bedgraph file and isis_paired_end.

perl procReads.pl chr5 preproc/merged/chr5.sam \ 
demo/dna.snv.demo.lst preproc/chr5.snv preproc/chr5.bed 1 2
perl procReads.pl chr1 preproc/merged/chr1.sam \ 
demo/dna.snv.demo.lst preproc/chr1.snv preproc/chr1.bed 1 2
perl procReads.pl chr10 preproc/merged/chr10.sam \ 
demo/dna.snv.demo.lst preproc/chr10.snv preproc/chr10.bed 1 2

 

--Parameter

chr chromosome to be investigated (correspond to the input_sam_file)

input_sam_file SAM file input after duplicate removal (use rmDup.pl)

intput_snvs input SNV list (without read counts)

output_snvs output SNV candidates with read counts

output_bedgraph output bedgraph file, see below for the details:

genome.ucsc.edu

is_paired_end 0: single-end; 1: paired-end

For paired-end reads, all reads should be paired up,

where pair-1 should be always followed by pair-2 in the next line.

-----

4.3. Merge SNV read counts

Merge the SNV read counts and generate plots, using the following command;

perl mergeSnvs.pl preproc/ snv mono=0 preproc/rna.snv.demo.lst 1

---Parameter

prefix the folder path and prefix string for all chr* SNV files

e.g. “/home/N.A+/” for all chr* SNVs in that folder

suffix the suffix after chr1..22/X/Y/M

mono= options (no space):

“mono=0”: allow mono-allelic SNVs;

“mono=x”: discard SNVs with any allele < x reads;

useful when genotype is unknown and dbSNPs are used

where mono-allelic SNVs would be called

output the output file name for the merged SNVs

note that output.plus and output.minus are by-product

outputs for + and - strands for strand-specific cases

make sure the output and chr* SNV file names do not conflict

--

5. Allele Specific Expression Analysis

asarp.pl -- The new and improved ASARP pipeline to discover ASE/ASARP genes/SNVs, which now supports strand-specific RNA-Seq data

5.1. ASARP Run

a. Run the asarp.pl using the following command;

perl asarp.pl demo.results/asarp.results demo/demo.config \ 
demo/demo.param

 

b. Look at the output files;

Output results to demo.results/asarp.results.ase.prediction

Output results to demo.results/asarp.results.gene.prediction

Output results to demo.results/asarp.results.snv.prediction

Output results to demo.results/asarp.results.controlSNV.prediction