Loading...
 

Introduction to R

r

1. What is R?

An open source, free, popular statistical platform, programming/scripting language, available for all Windows, Mac and Linux. R has capabilities to perform almost any biomedical data analysis, at any scale.

https://www.r-project.org/

ZqgrFf-qB_Wkk6Zl5VZoCydwtqVwK9wjQEJYbwpaiKeYPyZ4fKD2eU51E1XIaEwHPC_ByPtBTkLXUl3Df1i-EkZ5Jp-n6X-kr5D9tHLQPp_lvkELODsqtixklwEsAz3EMNmYWEcD

2. Why R ?

  • Open Source/Free

  • Looks/Feels/Works same in any platform

  • Powerful statistical capabilities

  • Powerful visualization capabilities

  • Powerful data management and analysis capabilities

  • Huge literature/documentation

  • CRAN - about 10,000 available packages

    • https://cran.r-project.org/

  • BioConductor - Thousands of bio specific software packages

3. R Interface

R could be accessed through different modes;

  • a graphic user interface

  • a command line

  • a development environment

  • a programmatic access

R GUI

3QrsoA6DJtzowR55EXjKuBUJh7Qgf2DFZuTCcLJbcKBBGij3U8CbxhUjLUd7WIW02vZGNIk2CwrcUkVSUzdDfphxO2N9LEknQytztd3jh4x48xiVJ3HyGCEcSzaQ5U0oWiVQBqkU

R Command Line

pMtkPRNhmQK3zXBN090IoXXJZp946JgvWv_uy_9-rXFq1gZcOtNuLvUznLZzdjKRgvOBKhhxrPBisQy0p9BZi2pyL6JS5t9n7yQ4_13ZcHPhnVPSLYO8zrM08Ovn6ov43aZe-fMe

RStudio

y8YGxBgZMUawkoW2nWZ5XPJIA-EuiNOSQdu8f36E85nwLdlYtnZtDDoU9VkhGgPbuPwffmoKZ8zxXbFatLXdcZrxEmfYrcoBtLpqo1Nadbkt9xQqeEOmIH4Mqt0stZ7wjMbcxCyv

4. Introduction to RStudio

RStudio is an Open Source, Free Integrated Development Environment for R.

4.1 Interface

Type the following command in the console and hit Enter;

help(“graphics”)
demo(“graphics”)
  • Console - runs the code interactively


Eh5JdHezM2f6WJTaHnDSpMK5_hkftqX4JrPZEqbR0P8dINI6Xc2eAHNJX9ity1ZCWpNuXMzIzauKp0HXyo3Nu6UsfeaSM1EUn2F-FWdaQvlzHhT4SxuPcL4pUtJA3ut7ouY424jI
 

  • History - all executed commands stored

XIhq8As4TedbO3NGGSSB6qFwbfj8sohwzSrie22Js5IJ54eUwra1yFhgNgC15UqgS8RynoLZJCbx_FlpXTU0yU_l7MD6m6BDxQ7xC1AI4ZeFBd8aq_AL6BPD1b9r_8tqwx8wlqbM

 

  • Environment - all objects, parameters, variables used

0rBRb6O6gDr3cSk6cuGBt_SEhv2S5IqCL68UvyQH3FxlxEaU_M_rOP9FSzQDJI5gYwG8GYOM0z_ZL5G8qnMupcx2VRNbPLYdM4wLLC1r0xAKM0tg2b0oa1rBS9FJEaHFuYcDiIB1

 

  • Plots - Current plot, export plot

o-NnVgRljaUdMXxX-7LRcUV4OPyMzZjRkx0jNbPE2n7oGEcnpv6yhVTBAYmibPYfxJDbAUOME5pE1dzDZ2lDqnCPvqJn9tVNJ-fnNaadzUnyPOF_Rlcp74m9Z8uZzlyr1H1LNrod

 

  • Packages - List of installed packages

_-Q9UtGnLFKTqw7OzsA-onWx1m02CTMDMhbNf5iXBj9ko_8GzeRNYHuOR4_8EwY2EuURTkT66XWI5aVLg4wUEPvwyDftdiRWN4fVx5XXFOnqs0NcY_AriecP9g4pgfEJ72p3JMYv

 

  • Packages - Search, install and load packages

fZkEAy40c6BoTe4ia_IYUAgX_k6CTaU40SSDwv4dJTrzs4eaPGG-HP4V-KI7MVFQn1o13jdtT_t-yOrjpk3Owf-wZKZVXmLetUJhIQh0HnsSk3AviUWLGvgkfdqOY5sulzeBNr9Z

4.2 Data management (import, view, sort, edit, export)

Set the working directory using the “Session”, “Set Working Directory” menu option.

4.2.1 Use the “Import Dataset” option

....to select and import the interaction data for STAT1 (stat1_interactions.tsv). Set the appropriate parameters as shown below (“Delimiter”) and click “Import” to import the data.


5HgaJ-ThIs-71fKtKZI5epK6SsjFBDPuyxD8LmYTggWF7SQB4qeuVsahONiq9BWDcRjTqdJ5e17xsyQOYhLzEnNKf5_6isiC_NQ5JShjZaRv74RNxVGaRlfzNxoKj8-OMzEsxBhA

4.2.2 View, Sort and Filter Data - in the viewer


q98KPKnGK-S6nVGqT9MWZiESAbOscIDc1hvyRb7hvohUK8Eo6Z7dx2eakABLmMEoqd_kfAjRT86e5zuoeM1jAT6ah-akIt1UlsE96BAMU1RHk90P6G1xnfVlxAVfEnLl6CCXDSS6

4.2.3 Edit Data

Type the following command and hit Enter to bring up the data editor window, where you can edit the data. Edits are not saved automatically. Create a new variable to store the edits;

edit(string_interactions)



ngT1lqHMTP9Mq176DfEarQhMYhCutWt02YenNnFPnpsYjkSduQIsoJZdLlcBwWabJNA2_rumjZuWAn_TyzNhqBJPUH5tNaSf1ECxvare7xLu7MmnHRNLmvk7K7EX9rmJYipIjawi
 

4.2.4 Export data

Copy from right bottom corner to left top corner in the viewer and paste it to your favorite spreadsheet application;

Export data - you can also export the data using the below command;

write.csv(string_interactions, “string.csv”)

4.3 Getting help

?plot – search within installed R

??plot – search internet R

example(“plot”) - display examples for functions/packages

vignette(“plotexample”) - PDF vignette

data() - lists available example datasets

4.4 Saving options

.Rhistory

Saves all the commands executed in the sessions. Save as a file.

.RData

Saves all the objects and functions available in a session. Save as a file.

5. Programming with R

Basic components of the R working environment includes the following components;

  • Variables

  • Operations

  • Conditions

  • Loops

  • Functions

  • Scripts

5.1. Variables (objects)

A variable or an object is a symbol that holds data. Variables can hold numeric (1,2,3 etc.) or character (a,b,c,ball etc.) type data. There 4 basic types of variables in R; - scalars - vectors - matrices - data frames

5.1.1. Scalars

Variable that holds a single value

#anything after the symbol # is a comment and ignored by R
#assign value 5 to scalar variable s1
s1 <- 5
s2 <- 6
#view the contents of the scalar variable s1
s1

1 5

s2

1 6

#view what kind of variable is s1
mode(s1)

1 “numeric”

5.1.2. Vectors

Variable that holds a sequence of numbers

#c concatenates the sequence of numbers
v1 <- c(1,2,3,4,5)
v2 <- c(10,20,30,40,50)
#v3 is a character vector
v3 <- c(“stat1”,“stat2”,“stat3”,“stat4”,“stat4”)
v1

1 1 2 3 4 5

v2

1 10 20 30 40 50

#summary statistics of v2
summary(v2)

Min. 1st Qu. Median Mean 3rd Qu. Max. 10 20 30 30 40 50

#read/view a specific element in vector v1
v32” rel=”“>2

1 “stat2”

5.1.3. Matrices

A matrix consists of rows and columns, as a two-way table. The data points are accessed using a row and a column index.

#use the provided numbers to create a matrix with 3 rows and 3 columns
#matrix is the function, nrow and ncol are arguments with values 3
m1 <- matrix(c(1,2,3,4,5,6,7,8,9),nrow = 3, ncol = 3)
m1

,1 ,2 ,3

1, 1 4 7 2, 2 5 8 3, 3 6 9

#what kind of object is m1
class(m1)

1 “matrix”

#what are the dimenstions of the matrix m1
dim(m1)

1 3 3

#get the value of the element from the second row, second column
m12,2” rel=”“>2,2

1 5

#get all the values of 2nd column
m12” rel=”“>,2

1 4 5 6

#get row 2 of column 2,3
m12,2:3” rel=”“>2,2:3

1 5 8

5.1.4. Data Frames

Data frames consists of rows and columns like matrix, but with any type of data (numbers, characters etc.). Most of the time you will be working with data as frames.

#The ‘stat1_interactions’ data is loaded as a data frame.
class(stat1_interactions)

1 “tbl_df” “tbl” “data.frame”

#dimensions of the data frame
dim(stat1_interactions)

1 187 11

#extract column 1,2 and 11 alone and store it in a new variable statsmall
statsmall <- stat1_interactionsc(1,2,11)” rel=”“>c(1,2,11)
dim(statsmall)

1 187 3

#view the column headers
names(statsmall)

1 “#node1” “node2” “combined_score”

#view statsmall
statsmall #also try statsmall$node2

 

A tibble: 187 × 3

 

#node1 node2 combined_score    1 IRF9 STAT2 0.999 2 STAT1 PRKCD 0.999 3 JAK1 STAT2 0.999 4 JAK2 STAT1 0.999 5 STAT1 PIAS1 0.999 6 JAK2 STAT3 0.999 7 PTPN11 KIT 0.999 8 STAT3 EP300 0.999 9 JAK1 STAT3 0.999 10 PTPN11 PDGFRB 0.999 # ... with 177 more rows

#plot the values of combined_score
plot(statsmall$combined_score)

 

QiD6PmtiFz9_7vj3FteMj1TZ1G4-IesRmAbuQyZdGVjI_Yo8aMYxVcJXdz6tTKWLIAJcIzMSbe3dHSUOEXyOONtQM0ExOR6SJmA6rqgjC0u0BPsWLrOyWmsrQqb70Lnic5Xq3Inz

#create a subset of the interactions that have score >= 0.5
statsmallfiltered <- subset(statsmall, combined_score >= 0.5)
dim(statsmall)

1 187 3

dim(statsmallfiltered)

1 136 3

5.2. Operations

R can perform basic operations like addition, multiplication, division etc., along with a host of more advanced operations. Several operations work similarly on different types of variables.

#add two numbers
20+30

1 50

#add two numbers stored in variables s1, s2
s1+s2

1 11

#compare two variables
s1+s2 > 20

1 FALSE

#add two vectors v1,v2
v1+v2

1 11 22 33 44 55

#element wise operation
sqrt(v1)

1 1.000000 1.414214 1.732051 2.000000 2.236068

#sum of elements (also try min, max, median, mean, sd, var etc)
sum(v1)

1 15

#sum of all rows in the matrix m1
apply(m1,1,sum)

1 12 15 18

#mean of the combined_score in statsmall
mean(statsmall$combined_score)

1 0.748984

5.3. Conditions

Conditions test if a statement is true or false

#check if s1 is greater than s2
#if the condition is true, assign a value for s3 as s2-1
#if the condition is false, assign a value for s3 as s2+1
if (s1 > s2)
 {
 s3=s2-1
 } else
 {
 s3=s2+1
 }
s3

1 7

5.4. Loops

Loops are used to do a particular operation on a set of data, without having to manually perform that actions multiple times

#the for loop goes over each number from 1 to 10, assigning the current number in the range to i, for every iteration. A print statement is then executed for each of the interation
for(i in 1:10)
{
 print(i)
}

1111111111 10

#same for loop but with an addition statement
for(i in 1:10)
{
 xfor=i+1
 print(xfor)
}

111111111 10 1 11

5.5. Functions

Functions are the crucial part of R. Functions are a set of in-built commands that perform certain standard tasks. For example, ‘print’ is a function that prints the values of the variable. ‘sum’ is a function that sums up the values that are passed to it, etc. Below are some of the important functions to remember;

You can also write your own functions, for chunks of codes that you repeated use. This way, instead of typing all of the lines of the code every time, you could just call the function and be done with that. Its like magically using a single word to run a bunch of lines of code.

#write a function avecomscore, to calculate the percent average of the combined_score in any three-column small interaction dataset

avecomscore = function(x){
 comscoremean=mean(x$combined_score)
 comscoremeanpercent=comscoremean*100
 return(comscoremeanpercent)
}

#now use the function to calculate the percent average of the combined_score in statsmall dataset
avecomscore(statsmall)

1 74.8984

6. Scripting

Scripting is a task of placing a bunch of commands that perform individual actions, in a single file and executing all of those commands by running that one single file in a automated way.

Below is an example script that reads the interactions files, grabs a subset of the file content, filters it for the score, plots a graph of the score and writes out a new file with the filtered set of interaction.

Place all of these commands in one single text file (using a text editor like notepad or textedit), save it as a script (use .r or no extension).

#read the input data
stat1_interactions_new <- read.csv(”~/stat1_interactions.tsv”,sep=“\t”)

#extract the first two and the last column
statsmallnew <- stat1_interactions_newc(1,2,15)” rel=”“>c(1,2,15)

#extract the interactions with more than 0.5 score
statsmallnewfiltered <- subset(statsmallnew, combined_score >= 0.5)

#write the filtered interactions in to a file
write.table(statsmallfiltered, “statsmallfiltered.txt”, sep = “\t”, quote = FALSE, row.names = FALSE)

#print the score distribution graph
png(‘statsmallnew.png’)
plot(statsmallnew$combined_score)
dev.off()

png 2

message(“Script ran successfully, all outputs are in the working director”)

## Script ran successfully, all outputs are in the working director

If the above script is named as script.r, run the script in the console using the following command;

source(‘~/interaction-data-process-script.r’)