2. Operate data in R
Data Types in R
2.1 Variables- Numerics,Character,Factors,Logical
Variable allows you to store a value in R. You can then later use this variable’s name to easily access the value that is stored within respective variable.
|# Assign the value 50 to x
x <- 50# Print out the value of the variable x
print(x)Output = 50
Decimals values like 8.22 are called numerics.
Natural numbers like 86 are called integers. Integers are also numerics.
Boolean values (TRUE or FALSE) are called logical.
Text values are called characters.
|# Variable my_apple to be 42my_ apple <- 42
#Change my_character to be “Wow”
my_character <- ” Wow “
#Change my_logical to be FALSE
my_logical <- TRUE
If we are given the variables and we have no idea about the data type then one way to know the same is from class(variable)
Taking the above variables:
|# Check class of my_appleclass(my_apple)
# Check class of my_character
# Check class of my_logical
NOTE: No two data types can be solved,one has to convert it i n to the similar one to get the results.
|# Sales of Apple Sale_Apple <- c( 2,4,5,6)
# Sale of Orange
Sale_Orange <- c(4,5,8,7)
# Calculate Total Sales
Total Sales < c( Sale_Apple+Sale_Orange)
# Assign Variables to vectors Sale_Apple, Sale_Orange
Days<- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”)
# Assign the names of the day to vectors Sale_Apple, Sale_Orange.
names(Sale_Apple) <- Days
names( Sale_Orange) <- Days
2.3 Data Frames
A data frame has the variables of a data set as columns and the observations as rows.
Suppose we have loaded the data of “mtcars”.(https://vincentarelbundock.github.io/Rdatasets/datasets.html)
If you want initial or last observations from the data set we use head() and tail() commands.
head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.
Data frames as we now know can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
A list allows you to gather a variety of objects -matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
|# Vector with numerics from 1 up to 170my_vector <- 1:70
# First 50 elements of the built-in data frame mtcars
my_df <- mtcars[1:50,]
# Adapt list() call to give the components names
my_list <- list(my_vector, my_df)
# Print out my_list
3. Simple Linear Regression
Linear regression analysis is the most widely used of all statistical techniques: it is the study of linea-r, additive relationships between variables. Let Y denote the “dependent” variable whose values you wish to predict, and let X which is independent.
We use the data to understand regression In R.
#import data in R : read.csv(file.choose(),header=TRUE) screen shot 1
#attach the data: attach(LungCapData)
#correlation between age and LungCap :cor(Age,LungCap): 0.8196
#Run Regression : mod<- lm(LungCap ~ Age)
#Ask Summary: summary(mod) screen shot 2
#Scatterplot of Age and LungCap: plot(Age,LungCap,main= “scatterplot”)
abline(mod) screen shot 3
#coefficient : coef(mod) Screenshot4
#confidence Interval : confint(mod) Screen Shot 5
#Change Confidence interval : confint(mod,level=.99) Screen Shot 6
#Building ANOVA TABLE: anova(mod) Screen Shot 7
Assumptions of Linear Model
- Y values can be expressed as a linear function of X variable.
- The error terms are independent.
- Variation of observation is constant around the regression line.
- For given value of X,Y values are normally distributed.
We have run a regression
plot(mod): we see 4 charts
To get all the 4 plots in single page : par(mfrow=c(1,1))Screen Shot 8
In the first plot : For linearity ,line should fairly flat and for homoskedasticity bubbles should be constant
Second Plot : QQ Plot – Quantile-Quantile Plot –Y axis is ordered,observed,standardized residuals,X Axis is ordered theoretical residuals:If Y value or error terms are normally distributed the point lie on the diagonal line.