Loading...
 

Visualization of RNA-Seq Data Using CummeRbund

cummeRbund is an R package that is handy for producing useful graphs and figures. It is also (relatively) easily tailored for individual needs. We will study transcript expression and analyze DE using cummeRbund on the results from Cuffdiff above:

1. RStudio

1.1. Start RStudio

RStudio is available through your Applications - Programming menu option from your desktop.

Type all the commands at the RStudio command line;

1.2. Load cummerbund

Load cummerbund library into the R session with the following command;

library(cummeRbund)

1.3. Import Cuffdiff results

use setwd OR use the full path to your diff_out folder

cuff = readCufflinks('diff_out')

1.4. View CuffDiff data

cuff

1.5. Distribution of expression values

Examine the distribution of expression values for the reconstructed transcripts

csDensity(cuff)

 

pTIuB1i5zp6mKOuH3r8opwqlrMbzMGkynaEXrJr7nz6l8Pc704qVGeVdGNlFPCJYw4ti2bm7gYwo4nMkzvUQqphu5sDKBebRPtUX3hwOjkvKcDvwBysm3hoLMZ1YrtpOAK7NncsI4djDs1FZeQ

1.6. Sample DE genes marix

We can look at a matix of DE genes between the different sampels

sigMatrix(cuff)

 

D5OGzurmRTED0cc1fecRGs9rJKFO9V5X56rArMPFs-BOthtQtpI1FA0Mghw1hGund9A2H9s2e3IzkUj_STl9HmJmtENJScKlCLf4GTYQs_Xr3hHfVfvwQzG-4XwQK_GecJh8Rc829JHVnKTHhQ

1.7. Distance heatmaps

We can look at distance heatmaps or MDSplots to get and idea of the differences between the samples

csDistHeat(cuff)
MDSplot(cuff)

1.8. Gene level data

1.8.1. Expression data

Retrieve the gene-level differential expression data

gene_diff_data = diffData(cuff)

1.8.2. Number of Genes

How many ‘genes’ are there?

nrow(gene_diff_data)

1.9. Extract significant genes

a. From the gene-level differential expression data, extract those that are labeled as significantly different.

Note, normally just set criteria as “significant=’yes’”, but we’re adding an additional p-value filter just to capture some additional transcripts for demonstration purposes only.

sig_gene_data = subset(gene_diff_data,(significant=='yes' | p_value < 0.01))

The pipe | above is interpreted as OR by R

b. How many genes are significantly DE according to these criteria?

nrow(sig_gene_data)

c. Examine the entries at the top of the unsorted data table:

head(sig_gene_data)

 

gene_id sample_1 sample_2 status   value_1  value_2 log2_fold_change

56  XLOC_000056    Sp_ds    Sp_hs     OK 33560.500 117.6900         -8.15563

136 XLOC_000136    Sp_ds    Sp_hs     OK 30094.800 246.0650         -6.93433

146 XLOC_000146    Sp_ds    Sp_hs     OK  1125.700  70.9957         -3.98694

269 XLOC_000056    Sp_ds   Sp_log     OK 33560.500 108.8130         -8.26877

315 XLOC_000102    Sp_ds   Sp_log     OK     0.000 440.3590              Inf

336 XLOC_000123    Sp_ds   Sp_log     OK   753.187   0.0000             -Inf

   test_stat p_value  q_value significant

56   -4.86421 0.00360 0.257929          no

136  -4.49008 0.00795 0.291731          no

146  -3.99117 0.00640 0.291731          no

269  -4.76096 0.00450 0.291731          no

315        NA 0.00160 0.143550          no

336        NA 0.00005 0.008700         yes

1.10. Save gene list

You can write the list of significantly differentially expressed genes to a file like so:

write.table(sig_gene_data, 'sig_diff_genes.txt', sep = '\t', quote = F)

2. Visualize Individual Data

Examine the expression values for one of your genes that’s diff. expressed. To do this, select expression info for the one gene by its gene identifier. We will do this by grabbing the first “1” gene identifier in our sig_gene_data table:

a. First, get the gene_id:

ex_gene_id = sig_gene_data$gene_id1

 

b. Print its value to the screen:

ex_gene_id

 

c. Get that gene ‘object’ from cummeRbund and assign it to variable ‘ex_gene’:

ex_gene = getGene(cuff, ex_gene_id)

3. Visualize group data

Plot the expression values for the gene under each condition.  The error bars are only turned off here because this data set is both simulated and hugely underpowered to have reasonable confidence levels)

expressionBarplot(ex_gene, logMode=T, showErrorbars=F)

goACxes8D3vHrIeEOGQ4YnCM3_CxqAdJoXfvdYdc2jcYi9yAfjH655vgntCSHDNP5JviAE7iWV6rvPKn87giItasZNn6ODLempPhp3YPSP7h9wJrhNSy8R80vlBx-CbyjuxDAuFcAl_X_hYSYg

4. DE genes heatmap

Draw a heatmap showing the differentially expressed genes

a. First retrieve the ‘genes’ from the ‘cuff’ data set by providing a list of gene identifiers like so:

sig_genes = getGenes(cuff, sig_gene_data$gene_id)

b. Draw the heatmap

csHeatmap(sig_genes, cluster='both')

 

sTD46Gx8wIZR3YDuucATTjQUoyL19xdAt3zrhs4hHg_hhpEwDxW3Y5L5PW-2VV9kxZHd-82jZRVrg8Rdgvfk-i4IuJ7TXGvsI1mkLoe9vkTt7nxVd9M8XBgg_v7lJgTwUfikcBbqXZWXZpxqhw