This lesson is still being designed and assembled (Pre-Alpha version)

Workshop Title

Setup

Overview

Time: min
Objectives

Software setup

Please install R and RStudio before this workshop or login to the UIC virtual lab to use the software required for the workshop. See instructions below for both options.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Video Tutorial

Instructions for R installation on various Linux platforms (debian, fedora, redhat, and ubuntu) can be found at <https://cran.r-project.org/bin/linux/>. These will instruct you to use your package manager (e.g. for Fedora run sudo dnf install R and for Debian/Ubuntu, add a ppa repository and then run sudo apt-get install r-base). Also, please install the RStudio IDE.

Virtual Lab

If you would prefer not to install the software for this workshop on your computer, you may use the Virtual lab service run by Technology Services. This allows you to use a virtual machine either from your web browser or from a desktop app installed on your computer. Overall you may have a better experience using it from the desktop app, but the browswer should suffice for most workshops.

See browser instructions here
See desktop instructions here

Install the videoconferencing client

If you haven't used Zoom before, go to the official website to download and install the Zoom client for your computer.

Set up your workspace

Like other Carpentries workshops, you will be learning by "coding along" with the Instructors. To do this, you will need to have both the window for the tool you will be learning about (a terminal, RStudio, your web browser, etc..) and the window for the Zoom video conference client open. In order to see both at once, we recommend using one of the following set up options:

This blog post includes detailed information on how to set up your screen to follow along during the workshop.

Setup files:

Please download the files emailed to particpate in the workshop:

About the Data Used in this Workshop:

This workshop uses an adapted version of the data paper: Nitsch, F. J., Sellitto, M., & Kalenscher, T. (2021). The effects of acute and chronic stress on choice consistency. Psychoneuroendocrinology, 131, 105289. https://doi.org/10.1016/j.psyneuen.2021.105289.

The data paper along with its underlying data publicly available at: https://osf.io/6mvq7 were adapted and used for educational purposes with authors’ permission.

Key Points


Introduction to R and R studio

Overview

Time: min
Objectives
  • Understand the basics of R and R studio

  • learn about the Rstudio Interface

R

R is a specialized language most commonly used for statistical computing, data analysis, and implementing graphics. It is open-source and free program. R Language is used worldwide by statisticians and data miners . It helps to perform more efficient and precise data wrangling, analyzing, and visualizing functions for a larger dataset compared to a spreedsheet

Why use R

Based on the 2021 survey conducted by Kaggle, R was the third most used programming language by data professionals

Programming language use chart

Image Source: Business Broadway, 2021

Understanding R Studio and Console

RStudio is the integrated development environment (IDE) for the basic R software. It is available in two versions:

RStudio

Script Area: - Write codes (or) scripts and run them separately. Also, create a document outline (located on the top right of the script area) in this section that shows all the cod headers in one space.

Console: - Write and run the code together directly here. It also displays the history of any command or an error message in case of a code error.

Environment – List of objects and variables created and present in the current session and also shows the current project file name at the top right of the pane.

Graphics: - Displays the plots, packages, and has an important tab of files. The files option helps us navigate through the different folders of the current project and makes organizing and sorting things a lot better.

The preferences tab in the toolbar helps customize the margins, displays, and font sizes in the r studio.

Help and Cheatsheets in RStudio

help(function_name) – Provides detailed description of function in help window (bottom right) E.g., Run the command help(sort) in the console.

Help Rstudio

You will now get a complete description of the “sort” function in the help window Points to note:

Cheatsheet – In the wild and woolly world of R there are many packages and to summarize this package functions the cheat sheets come in handy. These cheat sheets are invaluable as learning tools. RStudio has created a large number of cheat sheets, including the one-page R Markdown cheat sheet, which is freely available here

Cheatsheet

Key Points


R syntax and operators

Overview

Time: 0 min
Objectives
  • perform basic arithematic functions

  • Understand the basic logical operators

R syntax

Before learning how to code in R, it is necessary to understand the fundamentals of R. As an example, the code below performs basic arithmetic operations such as Addition (+), Subtraction (-), Multiplication (*), Division (/), and Modulo (%%), providing us with output similar to that of a basic calculator. Apart from these simple equations, R is capable of many additional functions for which the following operators are required.

operation directly in the console.

 > 2+2
 [1] 4
 > 2-2
 [1] 0
 >2*2
 [1] 4
 > 2/2
 [1] 1
 > 3%%2
 [1] 1

R Console essentials

Loading a Script in R - R provides you with variety of options to run a script. In the Script pane you can either copy and paste a script or type a new script or area select File → Open File → load an existing R Script from the menu.

Run and option command in R - The script’s execution provides us with a code shown in the console pane, which is accessed by clicking the RUN button on the script pane or by pressing the ctrl + Enter keys.On the other hand we can access the options menu by going to Menu bar»tools»options which will provide you with a menu to change the apperance, accessibility and other settings of the R studio.

Rstudio run command

Assigning values in R- Values can be assigned to variables in R using the “<-” symbol. The variable is written on the left and is assigned the value on the right side. For example, to assign a value of 3 to x we can type the code, x <- 3 Assigning values to variables are quite useful especially if these values would be used again. Similar to the previous examples, operations can be performed on the variables to get output directly (or) the output can be stored in a different variable. Once a variable is created it will be visible under the environment section

> x <- 3
> y <- 5
> x+y
[1] 8
> z <- x+y
> z
[1] 8

Commenting in R - Comments are a way to improve your code’s readability and are only meant for the user so the interpreter ignores it. Comments starts with a # When executing the R-code, R will ignore anything that starts with #. Only single-line comments are available in R but we can also use multiline comments by putting then in or quotes. Example:- # This is a comment “Hello World!”

Function in R A key feature of R is functions. Functions are “self contained” modules of code that accomplish a specific task. Functions usually take in some sort of data structure (value, vector, dataframe etc.), process it, and return a result. The general usage for a function is the name of the function followed by parentheses:function_name(input)

One thing to be aware of is that R is case-sensitive. Hence variable “a” is different from “A”

LOGICAL OPERATORS

Provides a list of Boolean results based on operation performed

Please note that in R the Boolean values “TRUE” & “FALSE” can also be written as “T” &” F”.

Key Points


Variables and datatypes

Overview

Time: min
Objectives
  • understand where to access packages for different functions

  • learn about the data types permitted for analysis in R studio

Variables

-Variable are basically a storage location where you store some type of value and where that value can be altered based on your need. Variable is also known as Identifier because the variable name identifies the value that is stored in the memory (RAM). As we Know R is a case-sensitive language hence a variable ABC = 15 and Abc= 32 can have different values.

Naming Variables

Data Types

Data type in R specifies the size and type of information the variable will store.

R language has five main data types

R Datatype

Checking data type in R

There are several functions that can show you the data type of an R object, such as typeof, mode, storage.mode, class and str.the main use of some of them is not to just check the data type of an R object. For instance, the class of an R object can be different from the data type and the str function is designed to show the full structure of an object. If you want to print the R data type, we recommend using the typeof functio There are other functions that allow you to check if some object belongs to some data type, returning TRUE or FALSE. As a rule, these functions start with is. followed by the data type.

Example- is.numeric(4) #true

Data type coercion

You can coerce data types in R with the functions starting with as., summarized

as.numeric == numeric
as.integar == integer 
as.double == double 
as.character == Character
as.logical== Boolean
as.raw ==	 Raw

Character data type

Character data type stores value or strings and contains alphabets, numbers, and symbols Character data type value is written withing single (‘ ‘)or double inverted quotes (“ “) Example- “A”, “2.21”, “skill@”.

# input code
# Declaring character value with double quotes ""
charac <- "Abcd"
charac
class(charac)

#Convert values to character data type.
pi_value <- 3.14
x <- as.character(pi_value)
x
class(x)

# Concatenation of Character
firstname <- "Kasturi "
lastname <- "Acharya"

# Character Value Concatenation
# Paste function is used to concatenate characters
full_name <- paste (firstname, lastname)
full_name

# output
# Declaring character value with double quotes ""
> charac <- "Abcd"
> charac
[1] "Abcd"
> class(charac)
[1] "character"

> #Convert values to character data type.
> pi_value <- 3.14
> x <- as.character(pi_value)
> x
[1] "3.14"
> class(x)
[1] "character"
> 
> # Concatenation of Character
> first_name <- "Kasturi"
> last_name <- "Acharya"
> 
> # Character Value Concatenation
> # Paste function is used to concatenate characters
> full_name <- paste (first_name,last_name)
> full_name
[1] "Kasturi Acharya"

Complex data type


# input code
# Assign complex value to x
x <- 10 + 6i + 20
x
class(x)
z <- 6i
z
class(z)

#Using as.complex() function to convert value to complex.
as.complex(5)
as.complex(7i)

#Performing Addition on Complex Numbers
y1 <- 7+3i
y2 <- 8+9i
sum_y <- y1+y2
sum_y
class(sum_y)

# output
> # Assign complex value to x
> x <- 10 + 6i + 20
> x
[1] 30+6i
> class(x)
[1] "complex"
> z <- 6i
> z
[1] 0+6i
> class(z)
[1] "complex"

> #Using as.complex() function to convert value to complex.
> as.complex(5)
[1] 5+0i
> as.complex(7i)
[1] 0+7i

> #Performing Addition on Complex Numbers
> y1 <- 7+3i
> y2 <- 8+9i
> sum_y <- y1+y2
> sum_y
[1] 15+12i
> class(sum_y)
[1] "complex"
> 

Numeric Data Type


# input code
# Assigning a decimal value to variable x
x <- 15.6
x
class(x)
typeof(x)

x1 <- 20
x1
class(x1)
typeof(x1)


# Converting an integer value to numeric type
x2 <- 22L
class(x2)
typeof(x2)
x3 <- as.numeric(x2)
x3
class(x3)
typeof(x3)

# output
> # Assigning a decimal value to variable x
> x <- 15.6
> x
[1] 15.6
> class(x)
[1] "numeric"
> typeof(x)
[1] "double"
> 
> x1 <- 20
> x1
[1] 20
> class(x1)
[1] "numeric"
> typeof(x1)
[1] "double"
> 
> # Converting an integer value to numeric type
> x2 <- 22L
> class(x2)
[1] "integer"
> typeof(x2)
[1] "integer"
> x3 <- as.numeric(x2)
> x3
[1] 22
> class(x3)
[1] "numeric"
> typeof(x3)
[1] "double"

Integer Data Type


# input code
x <-  18L # putting capital 'L' after a value forces it to be
# stored as Integer.
class(x)

y <-  9
class(y)

x1 <-  23.0L
x1 <-  23L
class(x1)

# Using integer function to declare an Integer type value 
y1 <-  as.integer(44)
class(y1)

#coerce a numeric value into integer
y2 <-  as.integer(45.2)
y2


#Convert Logical States to Integer
Logic_True <- as.integer(TRUE)
Logic_True


# To check if the value is integer type:
is.integer(x)
is.integer(y)
is.integer(y1)

#Creating integer vector from 1 to 5
m = 1:5
m
class(m)

# output
> 
> x <-  18L # putting capital 'L' after a value forces it to be
> # stored as Integer.
> class(x)
[1] "integer"
> 
> 
> y <-  9
> class(y)
[1] "numeric"
> 
> 
> x1 <-  23.0L
Warning message:
integer literal 23.0L contains unnecessary decimal point 
> x1 <-  23L
> class(x1)
[1] "integer"
> 
> 
> # Using integer function to declare an Integer type value 
> y1 <-  as.integer(44)
> class(y1)
[1] "integer"
> 
> #coerce a numeric value into integer
> y2 <-  as.integer(45.2)
> y2
[1] 45

> #Convert Logical States to Integer
> Logic_True <- as.integer(TRUE)
> Logic_True
[1] 1
 
> # To check if the value is integer type:
> is.integer(x)
[1] TRUE
> is.integer(y)
[1] FALSE
> is.integer(y1)
[1] TRUE
> 
> 
> #Creating integer vector from 1 to 5
> m = 1:5
> m
[1] 1 2 3 4 5
> class(m)
[1] "integer"
> 
# input code
# BONUS
#Integers value can be a maximum 2147483647 (2 billion)
.Machine$integer.max 

#Double value can be a maximum 1.797693e+308 (very much > than 2B)
.Machine$double.xmax 

Logical Data Type

# input code
x <- TRUE
y<- FALSE

x1 <- T
y1 <- F

typeof(x1)
mode(x1)

####################
# Value Comparison #
####################

# Less Than and Greater Than Comparison
32 < 98  # TRUE Statement
37 > 52  # FALSE Statement
57 == 34  # FALSE Statement
80 == 80  # TRUE Statement
# output
 x <- TRUE
> y<- FALSE
> 
> x1 <- T
> y1 <- F
> typeof(x1)
[1] "logical"
> mode(x1)
[1] "logical"

> # Value Comparison #
>
> # Less Than and Greater Than Comparison
> 32 < 98  # TRUE Statement
[1] TRUE
> 37 > 52  # FALSE Statement
[1] FALSE
> # Equal TO Comparison
> 57 == 34  # FALSE Statement
[1] FALSE
> 80 == 80  # TRUE Statement
[1] TRUE

Key Points

  • importance of using packages in R studio for efficient data analysis


Intoduction to strings and data structures

Overview

Time: min
Objectives
  • understand basics of strings and string manipulation

  • understanding different data structure

  • learn about functions of each data structure

Strings

Rule for String in R

# input code string concatenation
Count number of characters 
x1 <- "Olivia"
x2 <- "Jhon"
x3 <- "William"

#checking number of characters
nchar(x1)
nchar(x2)
nchar(x3)

# Letters using vector function in R
# Check the sequence of letters
letters
letters[4]
letters[1:5]

# String Concatenation
# Paste function is used with syntax below:

x <- paste("Hello","World","!",sep = " ")
x

y <- paste(x1,x2,x3,"is happy.")
y

z<- paste("Hello","everyone","!", sep =" ")
z
# Vectors

c() # concatenate function

x4 <- c("Olivia","Jhon","William")
y1 <- paste(x4,"is happy.")
y1

z1 <- c("Please bring me","a few ")
z2 <- c("some vegetables","fruits")
z <- paste(z1,z2,collapse = " and ")
z
# output
 # input code string concatenation
> Count number of characters 
Error: unexpected symbol in "Count number"
> x1 <- "Olivia"
> x2 <- "Jhon"
> x3 <- "William"
> 
> #checking number of characters
> nchar(x1)
[1] 6
> nchar(x2)
[1] 4
> nchar(x3)
[1] 7
> 
> # Letters using vector function in R
> # Check the sequence of letters
> letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
> letters[4]
[1] "d"
> letters[1:5]
[1] "a" "b" "c" "d" "e"
> 
> # String Concatenation
> # Paste function is used with syntax below:
> 
> x <- paste("Hello","World","!",sep = " ")
> x
[1] "Hello World !"
> 
> y <- paste(x1,x2,x3,"is happy.")
> y
[1] "Olivia Jhon William is happy."
> 
> z<- paste("Hello","everyone","!", sep =" ")
> z
[1] "Hello everyone !"
> 
> x4 <- c("Olivia","Jhon","William")
> y1 <- paste(x4,"is happy.")
> y1
[1] "Olivia is happy."  "Jhon is happy."    "William is happy."
> 
> z1 <- c("Please bring me","a few ")
> z2 <- c("some vegetables","fruits")
> z <- paste(z1,z2,collapse = " and ")
> z
[1] "Please bring me some vegetables and a few  fruits"

String Manipulation

-it’s the process of corecing, slicing, pasting, or analyzing strings

x <- "William is happy today"
x

# Converting all words to upper case using toupper() function
toupper(x)

# Converting all words to lower case using tolower() function
tolower(x)

x1 <- "Henry is A hardworker. He owns A house and A car."
x1
chartr("A", "a", x1)

z <- "I widd gq tq market tqmqrrqw."
chartr("dq","lo", z)

x2 <- "Henry puts in all his good efforts"
x2
substr(x2, start = 22, stop = 27)

#split function

x4 <- "Henry puts in all his good efforts"
class(x4)
y1 <- strsplit(x4, split = " ")
y1
class(y1)
#either create a variable like y1 or direct use the function in case of Mason 
strsplit("Mason", split ="") 
x4
y2 <- unlist(strsplit(x4, split = " "))
y2
class(y2)

#output 
> x <- "William is happy today"
> x
[1] "William is happy today"
> 
> # Converting all words to upper case using toupper() function
> toupper(x)
[1] "WILLIAM IS HAPPY TODAY"
> 
> # Converting all words to lower case using tolower() function
> tolower(x)
[1] "william is happy today"
> 
> x1 <- "Henry is A hardworker. He owns A house and A car."
> x1
[1] "Henry is A hardworker. He owns A house and A car."
> chartr("A", "a", x1)
[1] "Henry is a hardworker. He owns a house and a car."
> 
> z <- "I widd gq tq market tqmqrrqw."
> chartr("dq","lo", z)
[1] "I will go to market tomorrow."
> 
> x2 <- "Henry puts in all his good efforts"
> x2
[1] "Henry puts in all his good efforts"
> substr(x2, start = 22, stop = 27)
[1] " good "
> 
> #split function
> 
> x4 <- "Henry puts in all his good efforts"
> class(x4)
[1] "character"
> y1 <- strsplit(x4, split = " ")
> y1
[[1]]
[1] "Henry"   "puts"    "in"      "all"     "his"     "good"    "efforts"

> class(y1)
[1] "list"
> #either create a variable like y1 or direct use the function in case of Mason 
> strsplit("Mason", split ="") 
[[1]]
[1] "M" "a" "s" "o" "n"

> x4
[1] "Henry puts in all his good efforts"
> y2 <- unlist(strsplit(x4, split = " "))
> y2
[1] "Henry"   "puts"    "in"      "all"     "his"     "good"    "efforts"
> class(y2)
[1] "character"

Data Structure

A data structure is essentially a way to organize data in a system to facilitate effective usage of the same.Data structures are the objects that are manipulated regularly in R. They are used to store data in an organized fashion to make data manipulation and other data operations more efficient. R has many data structure which are as follows

Vector

Vectors are the basic data structure of R. Vectors can hold multiple values together using the concatenate c() function. The type of data inside a vector can be determined by using the type of() function and the length (or) number of elements in a vector can be found with the length() function. R uses one indexing unlike python, hence the position of the first component in a vector can be accessed by vector name [1] A vector will always contain data of the same data type. If a vector contains multiple data types the vector will convert all its values to the same data type in the below order of precedence:

Technically, vectors can be one of two types:

Matrices and Array

Adding a dim attribute to an atomic vector allows it to behave like a multi-dimensional array. A special case of the array is the matrix, which has two dimensions. Matrices are used commonly as part of the mathematical machinery of statistics. Arrays are much rarer, but worth being aware of. Matrices and arrays are created with matrix() and array(), or by using the assignment form of dim()

Data Frames

A data frame is a very important data type in R. It’s pretty much the de facto data structure for most tabular data and what we use for statistics (explained in detail in the next section) Some additional information on data frames: Usually created by read.csv() and read.table(), i.e. when importing the data into R. Assuming all columns in a data frame are of same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect.

You can also create a new data frame with data.frame() function. Find the number of rows and columns with nrow(dat) and ncol(dat), respectively. Rownames are often automatically generated and look like 1, 2, …, n. Consistency in numbering of rownames may not be honored when rows are reshuffled or subset.

Lists

In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further list. You can create lists using list() or coerce other objects using as.list()


# DATA STRUCTURES ##########################################

## Vector ##################################################

v1 <- c(1, 2, 3, 4, 5)
v1
is.vector(v1)

v2 <- c("a", "b", "c")
v2
is.vector(v2)

v3 <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
v3
is.vector(v3)

## Matrix ##################################################

m1 <- matrix(c(T, T, F, F, T, F), nrow = 2)
m1

m2 <- matrix(c("a", "b", 
               "c", "d"), 
               nrow = 2,
               byrow = T)
m2

## Array ###################################################

# Give data, then dimemensions (rows, columns, tables)
a1 <- array(c( 1:24), c(4, 3, 2))
a1

## Data frame ##############################################

# Can combine vectors of the same length

vNumeric   <- c(1, 2, 3)
vCharacter <- c("a", "b", "c")
vLogical   <- c(T, F, T)

dfa <- cbind(vNumeric, vCharacter, vLogical)
dfa  # Matrix of one data type

df <- as.data.frame(cbind(vNumeric, vCharacter, vLogical))
df  # Makes a data frame with three different data types

## List ####################################################

o1 <- c(1, 2, 3)
o2 <- c("a", "b", "c", "d")
o3 <- c(T, F, T, T, F)

list1 <- list(o1, o2, o3)
list1

list2 <- list(o1, o2, o3, list1)  # Lists within lists!
list2

# COERCING TYPES ###########################################

## Automatic coercion ######################################

# Goes to "least restrictive" data type

(coerce1 <- c(1, "b", TRUE))
# coerce1  # Parenthese around command above make this moot
typeof(coerce1)

## Coerce numeric to integer ###############################

(coerce2 <- 5)
typeof(coerce2)

(coerce3 <- as.integer(5))
typeof(coerce3)

## Coerce character to numeric #############################

(coerce4 <- c("1", "2", "3"))
typeof(coerce4)

(coerce5 <- as.numeric(c("1", "2", "3")))
typeof(coerce5)

## Coerce matrix to data frame #############################

(coerce6 <- matrix(1:9, nrow= 3))
is.matrix(coerce6)

(coerce7 <- as.data.frame(matrix(1:9, nrow= 3)))
is.data.frame(coerce7)

# CLEAN UP #################################################

# Clear environment
rm(list = ls()) 

# Clear console
cat("\014")  # ctrl+L

# Clear mind :)


Key Points

  • understanding strings and vectors


Introduction to data frame

Overview

Time: 0 min
Objectives
  • learn how to create and access a data frame

  • learn data frame transformation and operations

Data Frames

Data frames are used for storing Data tables in R. They are two-dimensional array structures and are similar to tables where each column represents one variable. The main features to note about a data frame are:

Data frames in R can be created in two ways:

data.frame() FUNCTION:

While using the command we can follow the below syntax data. Frame (column_1, column_2, column_3, …………………….) Make sure that the names of the columns are unique and are of the same length. dataframe example

Creating a data frame

# input code

# Student ID, names and their marks.
student.data <- data.frame(
   std_id = c(001:005),
   std_name = c("William", "James", "Olivia", "Steve", "David"),
   std_marks = c(84.8, 98.4, 74.6, 80, 95)
)

# Display the dataframe student.data
student.data

# Check the structure of the dataframe student.data
str(student.data)

#check the head and tail of the dataframe student.data
head(student.data, 3)

tail(student.data, 3)


# Check the summary, lenth and dimension of the dataframe student.data
summary(student.data)

length(student.data)

dim(student.data)

# Check number of row/columns individually.
ncol(student.data)
nrow(student.data)

#output 
> # Student ID, names and their marks.
> student.data <- data.frame(
+    std_id = c(001:005),
+    std_name = c("William", "James", "Olivia", "Steve", "David"),
+    std_marks = c(84.8, 98.4, 74.6, 80, 95)
+ )
> 
> # Display the dataframe student.data
> student.data
  std_id std_name std_marks
1      1  William      84.8
2      2    James      98.4
3      3   Olivia      74.6
4      4    Steve      80.0
5      5    David      95.0
> 
> # Check the structure of the dataframe student.data
> str(student.data)
'data.frame':	5 obs. of  3 variables:
 $ std_id   : int  1 2 3 4 5
 $ std_name : chr  "William" "James" "Olivia" "Steve" ...
 $ std_marks: num  84.8 98.4 74.6 80 95
> 
> #check the head and tail of the dataframe student.data
> head(student.data, 3)
  std_id std_name std_marks
1      1  William      84.8
2      2    James      98.4
3      3   Olivia      74.6
> 
> tail(student.data, 3)
  std_id std_name std_marks
3      3   Olivia      74.6
4      4    Steve      80.0
5      5    David      95.0
> 
> 
> # Check the summary, lenth and dimension of the dataframe student.data
> summary(student.data)
     std_id    std_name           std_marks    
 Min.   :1   Length:5           Min.   :74.60  
 1st Qu.:2   Class :character   1st Qu.:80.00  
 Median :3   Mode  :character   Median :84.80  
 Mean   :3                      Mean   :86.56  
 3rd Qu.:4                      3rd Qu.:95.00  
 Max.   :5                      Max.   :98.40  
> 
> length(student.data)
[1] 3
> 
> dim(student.data)
[1] 5 3
> 
> # Check number of row/columns individually.
> ncol(student.data)
[1] 3
> nrow(student.data)
[1] 5

Importing data

There are multiple commands with various arguments to import data from different file formats into R environment. I shall show the simplest command to import a csv file as a data frame

data_frame_name <- read.csv(file. choose(), header = T) Here, file. choose() - Allows you to choose a .csv file stored in your local desktop Here, header = T - Indicates the first row in the file contains column names.

importing data

Double click (or) click once and select open on your desired file to import Once the data has been imported successfully the data frame would be visible with its name in the Environment pane on the top right.

Packages

install.packages(“package_name”) – Install the package from CRAN repository

install.packages( c(“package_1”, “”package_2”, “package_3”) ) -Install multiple packages

library(“package_name”) – Load the package in current R session.

Importing dataset and Packages

getwd()
Install Package############################
# I recommend "pacman" for managing add-on packages. It will
# install packages, if needed, and then load the packages.
install.packages("pacman")

# Then load the package by using either of the following:
require(pacman)  # Gives a confirmation message.
library(pacman)  # No message.

# Or, by using "pacman::p_load" you can use the p_load
# function from pacman without actually loading pacman.
# These are packages I load every time.
pacman::p_load(pacman, dplyr, GGally, ggplot2, ggthemes, 
  ggvis, httr, lubridate, plotly, rio, rmarkdown, shiny, 
  stringr, tidyr) 

library(datasets)  # Load/unload base packages manually

Using Tidyverse#############################################
install.packages("tidyverse")
library (tidyverse)

Importing Data #############################################
df <- read.csv("StateData.csv")
df
head(df)
str(df) 
summary (df)


df[c("State", "governor")]
head(df[c("State", "governor")])
summary(df[c("State", "governor")]) 
sum (df[c("State", "governor")]))
df = df.sum(axis=1)
df1 <- c(sum(df$instagram), sum(df$facebook))
df1
c (sum(df$instagram), sum(df$retweet))
sum(df$State) # character datatype
mean(df$instagram)
"1" %in% df$instagram

Key Points

  • basic statistical knowledge and formulas


sample dataset and importing data in R studio

Overview

Time: min
Objectives
  • learning how to use the sample dataset

  • understanding how to import data in R studio

Sample Dataset


# INSTALL AND LOAD PACKAGES ################################

# Load base packages manually
library(datasets) # For example datasets
?datasets
library(help = "datasets")

# SOME SAMPLE DATASETS #####################################

iris
?iris

cars <-cars

head(cars)

iris <- iris
head(iris)

tail(iris,20)

iris[,c(1,2)]

iris[,c('Sepal.Length')]

str(iris)

rm(list = ls())

iris

# CLEAN UP #################################################

# Clear environment
rm(list = ls()) 

# Clear packages
detach("package:datasets", unload = TRUE)  # For base

# Clear plots
dev.off()  # But only if there IS a plot

# Clear console
cat("\014")  # ctrl+L

Data Visulaistion using the basic Plot function in R

The most used plotting function in R programming is the plot() function. It is a generic function, meaning, it has many methods which are called according to the type of object passed to plot(). In the simplest case, we can pass in a vector and we will get a scatter plot (default) of magnitude vs index. But generally, we pass in two vectors and a scatter plot of these points are plotted. For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and (2,5) In the exercise below we will see how to create a generic plot, bar chart and a histogram in R using the Sample dataset available in the Program

PLOT FUNCTION IN R

LOAD DATASETS PACKAGES ###################################

library(datasets)  # Load/unload base packages manually

LOAD DATA ################################################

head(iris)

PLOT DATA WITH PLOT() ####################################

?plot  # Help for plot()

plot(iris$Species)  # Categorical variable
plot(iris$Petal.Length)  # Quantitative variable
plot(iris$Species, iris$Petal.Width)  # Cat x quant
plot(iris$Petal.Length, iris$Petal.Width)  # Quant pair
plot(iris)  # Entire data frame

Plot with options
plot(iris$Petal.Length, iris$Petal.Width,
  col = "#cc0000",  # Hex code for datalab.cc red
  pch = 19,         # Use solid circles for points
  main = "Iris: Petal Length vs. Petal Width",
  xlab = "Petal Length",
  ylab = "Petal Width")

PLOT FORMULAS WITH PLOT() ################################

plot(cos, 0, 2*pi)
plot(exp, 1, 5)
plot(dnorm, -3, +3)

Formula plot with options
plot(dnorm, -3, +3,
  col = "#cc0000",
  lwd = 5,
  main = "Standard Normal Distribution",
  xlab = "z-scores",
  ylab = "Density")

Plotting Bar Chart in R

 
library(datasets)

LOAD DATA ###############################################
?mtcars
head(mtcars)

sort.default(mtcars, decreasing = TRUE)

BAR CHARTS ###############################################

barplot(mtcars$cyl)             # Doesn't work

Need a table with frequencies for each category
cylinders <- table(mtcars$cyl)  # Create table
barplot(cylinders)              # Bar chart
plot(cylinders)                 # Default X-Y plot (lines)

CLEAN UP #################################################

Clear environment
rm(list = ls()) 

Clear packages
detach("package:datasets", unload = TRUE)  # For base

Clear plots
dev.off()  # But only if there IS a plot

Clear console
cat("\014")  # ctrl+L

Plotting Historgram in R


LOAD PACKAGES ############################################

library(datasets)

LOAD DATA ################################################

?iris
head(iris)

BASIC HISTOGRAMS #########################################

hist(iris$Sepal.Length)
hist(iris$Sepal.Width)
hist(iris$Petal.Length)
hist(iris$Petal.Width)

HISTOGRAM BY GROUP #######################################

Put graphs in 3 rows and 1 column
par(mfrow = c(3, 1))

Histograms for each species using options
hist(iris$Petal.Width [iris$Species == "setosa"],
  xlim = c(0, 3),
  breaks = 9,
  main = "Petal Width for Setosa",
  xlab = "",
  col = "red")

hist(iris$Petal.Width [iris$Species == "versicolor"],
  xlim = c(0, 3),
  breaks = 9,
  main = "Petal Width for Versicolor",
  xlab = "",
  col = "purple")

hist(iris$Petal.Width [iris$Species == "virginica"],
  xlim = c(0, 3),
  breaks = 9,
  main = "Petal Width for Virginica",
  xlab = "",
  col = "blue")

Restore graphic parameter
par(mfrow=c(1, 1))

CLEAN UP #################################################

Clear packages
detach("package:datasets", unload = TRUE)  # For base

Clear plots
dev.off()  # But only if there IS a plot

Clear console
cat("\014")  # ctrl+L

Clear mind :)

This brings us to an end of this workshop however the reference section provides links to all the materials used in this workshop and links which provide more detailed understanding of Each Package in R.

Key Points