Fusion Transcript Detection Using TopHat-Fusion

1. Data Preparation

Copy the appropriate example dataset to a new folder named fusion, using the below commands;

cd ~/Desktop
mkdir fusion
cd fusion
ln -s ~/Desktop/rnaseq_a/input/fusion/* .

2. Run TopHat

a. Now we run tophat on the data but add a few options for fusion gene detection.

tophat2 -o tophat_MCF7_test -p 4 fusion-search keep-fasta-order bowtie1 no-coverage-search -r 0 mate-std-dev 80 max-intron-length 100000 fusion-min-dist 100000 fusion-anchor-length 13 fusion-ignore-chromosomes chrM hg18 SRR064286_1.fastq SRR064286_2.fastq

3. Detect Fusions

a. For the fusion search

tophat-fusion-post -p 4 num-fusion-reads 1 num-fusion-pairs 2 num-fusion-both 5 hg18


b. This job takes a few minutes and there will be a few errors. We will examine some of the different outputs and do some text parsing with the following commands.

cd tophat_MCF7
samtools view accepted_hits.bam > fusion.sam
awk '{if($5 > 100) print}' fusions.out | sed 's/@\t/\n/g'
cut -f 1,18 fusion.sam | grep ‘XF:’ > fusion_potential.txt