1. Data Preparation
Copy the appropriate example dataset to a new folder named fusion, using the below commands;
cd ~/Desktop mkdir fusion cd fusion ln -s ~/Desktop/rnaseq_a/input/fusion/* .
2. Run TopHat
a. Now we run tophat on the data but add a few options for fusion gene detection.
tophat2 -o tophat_MCF7_test -p 4 fusion-search keep-fasta-order bowtie1 no-coverage-search -r 0 mate-std-dev 80 max-intron-length 100000 fusion-min-dist 100000 fusion-anchor-length 13 fusion-ignore-chromosomes chrM hg18 SRR064286_1.fastq SRR064286_2.fastq
3. Detect Fusions
a. For the fusion search
tophat-fusion-post -p 4 num-fusion-reads 1 num-fusion-pairs 2 num-fusion-both 5 hg18
b. This job takes a few minutes and there will be a few errors. We will examine some of the different outputs and do some text parsing with the following commands.
cd tophat_MCF7 samtools view accepted_hits.bam > fusion.sam awk '{if($5 > 100) print}' fusions.out | sed 's/@\t/\n/g' cut -f 1,18 fusion.sam | grep ‘XF:’ > fusion_potential.txt