Loading...
 

RNA-Seq Analysis Using HISAT2

1. Data Preparation

a. Get the data for this exercise from the example data folder in your system

cd ~/Desktop
mkdir hisat
cd hisat
ln -s ~/Desktop/rnaseq_a/inputs/hisat/* .

 

b. Now we will use a UNIX one-liner to generate our sample names for looping over a few commands to make this workflow easier to execute.

ls chrX_data/samples/ | cut -d _ -f 1 | sort | uniq > samples.txt
cat samples.txt

 

c. We will use each line in samples.txt file as a variable for our loop to run the different steps of the workflow.


2. Alignment Using HISAT2

for f in $(<samples.txt)      #enter
do      #enter
hisat2 -p 4 dta -x chrX/indexes/chrX_tran -1 chrX/samples/${f}_chrX_1.fastq.gz -2 chrX/samples/${f}_chrX_2.fastq.gz -S ${f}_chrX.sam      #enter
done      #enter

 

The above will run over each sample one by one replacing each $f with the line from samples.txt


3. Alignment Sorting

Sort the alignment file using samtools and convert them to bam files using samtools

-------
for f in $(<samples.txt)
do
samtools sort -@ 8 -o ${f}_chrX.bam ${f}_chrX.sam
done
-------

4. Transcript Assembly Using StringTie

a. The next step is to run stringtie, which is a cufflinks replacement that does transcript assembly.
-------
for f in $(<samples.txt)
do
stringtie -p 4 -G chrX_data/genes/chrX.gtf -o ${f}_chrX.gtf -l $f ${f}_chrX.bam
done
-------
b. Now we need to merge all of our data into a single gtf file

ls *.gtf > mergelist.txt
stringtie --merge -p 4 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf mergelist.txt

5. Assembly Statistics

a. We can use a gff compare tool to see some stats on our experiment vs the known genome.

gffcompare -r chrX_data/genes/chrX.gtf -G -o merged_stats stringtie_merged.gtf

 

b. Now we can look at the merged_stats.stats file.

ls *.gtf > mergelist.txt

 

c. Lets prep for DE analysis
-----
for f in $(<samples.txt)
do
stringtie -e -B -p 4 -G chrX_data/genes/chrX.gtf -o ballgown/${f}/${f}_chrX.gtf ${f}_chrX.bam
done
-----

Now we can use ballgown or cuddfiff or deseq2 to do some further analysis. See other appropriate tutorials.