1. Quality Analysis
a. Go into terminal and make a new directory on your desktop called “Trimming”.
b. Change to the Trimming director.
c. Copy the adapter file for trimming.
cd Desktop mkdir Trimming cd Trimming cp ~/Desktop/rnaseq_a/input/Trimming/adapt.fasta .
d. Now we will run fastqc on some paired end data and inspect the results (REMEMBER TAB COMPLETION! It prevents typos)
fastqc ~/Desktop/rnaseq_a/input/Trimming/s_6_101.fq fastqc ~/Desktop/rnaseq_a/input/Trimming/s_6_201.fq
2. Trimming
Based on the quality analysis using FastQC we can do some trimming. We are going to use Trimmomatic for this process. The manual for this can be found at this link: http://www.usadellab.org/cms/?page=trimmomatic This is a long command and its been separated to multiple lines using \ then enter without a space. You can type this on a single line as well.
java -jar /usr/bin/trimmomatic-0.36.jar PE -threads 4 \ s_6_101.fq s_6_201.fq \s_6_101_pe.fq s_6_101_se.fq \ s_6_201_pe.fq s_6_201_se.fq \ ILLUMINACLIP:adapt.fasta:2:30:10 \ HEADCROP:8 \ SLIDINGWINDOW:4:20 \ MINLEN:70
3. Genome Indexing
a. Return to your desktop and make a new directory called “tophat”
cd ~/Desktop mkdir tophat cd tophat
b. Make an index of the genome from the genome fasta file. This allows for faster searching and mapping of the sequences to the genome.
bowtie2-build ~/Desktop/rnaseq_a/input/tophat/genome.fa genome
4. Alignment
a.Run Tophat on first library of reads
tophat2 -I 1000 -i 20 library-type fr-firststrand \ -o tophat.Sp_ds.dir \ genome \ ~/Desktop/rnaseq_a/input/tophat/Sp_ds.left.fq \ ~/Desktop/rnaseq_a/input/tophat/Sp_ds.right.fq
b. Rename output
mv tophat.Sp_ds.dir/accepted_hits.bam tophat.Sp_ds.dir/Sp_ds.bam
5. Aligment indexing
samtools index tophat.Sp_ds.dir/Sp_ds.bam
6. Transcript assembly
a. Use Cufflinks to assemble and quantiy transcripts
cufflinks --overlap-radius 1 --library-type fr-firststrand \ -o cufflinks.Sp_ds.dir \ tophat.Sp_ds.dir/Sp_ds.bam
b. Rename Cufflinks
mv cufflinks.Sp_ds.dir/transcripts.gtf cufflinks.Sp_ds.dir/Sp_ds.transcripts.gtf
c. Repeat steps 4-6 on the other 3 samples.
Sp_log
Sp_plat
Sp_hs
Hint you can replace a string using this syntax (note only works on previous command)
!!:gs/old_text/new_text
7. Merge assemblies
a. Create an assemblies file with the list of all the required files
echo cufflinks.Sp_ds.dir/Sp_ds.transcripts.gtf >> assemblies.txt echo cufflinks.Sp_hs.dir/Sp_hs.transcripts.gtf >> assemblies.txt echo cufflinks.Sp_log.dir/Sp_log.transcripts.gtf >> assemblies.txt echo cufflinks.Sp_plat.dir/Sp_plat.transcripts.gtf >> assemblies.txt
b. Check assemblies.txt file:
cat assemblies.txt
c. Merge all assemblies together:
cuffmerge -s ~/Desktop/rnaseq_a/input/tophat/genome.fa assemblies.txt
8. Differential expression
a. Run Cuffdiff on these datasets:
cuffdiff library-type fr-firststrand \ -o diff_out \ -b ~/Desktop/rnaseq_a/input/tophat/genome.fa \ -L Sp_ds,Sp_hs,Sp_log,Sp_plat \ -u merged_asm/merged.gtf \ tophat.Sp_ds.dir/Sp_ds.bam \ tophat.Sp_hs.dir/Sp_hs.bam \ tophat.Sp_log.dir/Sp_log.bam \ tophat.Sp_plat.dir/Sp_plat.bam
b. Inspect Cuffdiff results:
head diff_out/gene_exp.diff