Advanced RNA-Seq Analysis Using TopHat

1. Quality Analysis

a. Go into terminal and make a new directory on your desktop called “Trimming”.

b. Change to the Trimming director.

c. Copy the adapter file for trimming.

cd Desktop
mkdir Trimming
cd Trimming
cp ~/Desktop/rnaseq_a/input/Trimming/adapt.fasta .


d. Now we will run fastqc on some paired end data and inspect the results (REMEMBER TAB COMPLETION! It prevents typos)

fastqc ~/Desktop/rnaseq_a/input/Trimming/s_6_101.fq
fastqc ~/Desktop/rnaseq_a/input/Trimming/s_6_201.fq

2. Trimming

Based on the quality analysis using FastQC we can do some trimming. We are going to use Trimmomatic for this process. The manual for this can be found at this link: http://www.usadellab.org/cms/?page=trimmomatic This is a long command and its been separated to multiple lines using \ then enter without a space. You can type this on a single line as well.

java -jar /usr/bin/trimmomatic-0.36.jar PE -threads 4 \
s_6_101.fq s_6_201.fq \s_6_101_pe.fq s_6_101_se.fq \
s_6_201_pe.fq s_6_201_se.fq \
ILLUMINACLIP:adapt.fasta:2:30:10 \

3. Genome Indexing

a. Return to your desktop and make a new directory called “tophat”

cd ~/Desktop
mkdir tophat
cd tophat


b. Make an index of the genome from the genome fasta file. This allows for faster searching and mapping of the sequences to the genome.

bowtie2-build ~/Desktop/rnaseq_a/input/tophat/genome.fa genome

4. Alignment

 a.Run Tophat on first library of reads

tophat2 -I 1000 -i 20 library-type fr-firststrand \
-o tophat.Sp_ds.dir \
genome \
~/Desktop/rnaseq_a/input/tophat/Sp_ds.left.fq \


b. Rename output

mv tophat.Sp_ds.dir/accepted_hits.bam tophat.Sp_ds.dir/Sp_ds.bam

5. Aligment indexing

samtools index tophat.Sp_ds.dir/Sp_ds.bam

6. Transcript assembly

a. Use Cufflinks to assemble and quantiy transcripts

cufflinks --overlap-radius 1 --library-type fr-firststrand \
-o cufflinks.Sp_ds.dir \


b. Rename Cufflinks

mv cufflinks.Sp_ds.dir/transcripts.gtf cufflinks.Sp_ds.dir/Sp_ds.transcripts.gtf


c. Repeat steps 4-6 on the other 3 samples.




Hint you can replace a string using this syntax (note only works on previous command)


7. Merge assemblies

a. Create an assemblies file with the list of all the required files

echo cufflinks.Sp_ds.dir/Sp_ds.transcripts.gtf >> assemblies.txt
echo cufflinks.Sp_hs.dir/Sp_hs.transcripts.gtf >> assemblies.txt
echo cufflinks.Sp_log.dir/Sp_log.transcripts.gtf >> assemblies.txt
echo cufflinks.Sp_plat.dir/Sp_plat.transcripts.gtf >> assemblies.txt


b. Check assemblies.txt file:

cat assemblies.txt


c. Merge all assemblies together:

cuffmerge -s ~/Desktop/rnaseq_a/input/tophat/genome.fa assemblies.txt

8. Differential expression

a. Run Cuffdiff on these datasets:

cuffdiff library-type fr-firststrand  \
-o diff_out \
-b ~/Desktop/rnaseq_a/input/tophat/genome.fa \
-L Sp_ds,Sp_hs,Sp_log,Sp_plat \
-u merged_asm/merged.gtf \
tophat.Sp_ds.dir/Sp_ds.bam \
tophat.Sp_hs.dir/Sp_hs.bam \
tophat.Sp_log.dir/Sp_log.bam \


b. Inspect Cuffdiff results:

head diff_out/gene_exp.diff