2. Operate data in R
Data Types in R
 Variable
 Tables
2.1 Variables Numerics,Character,Factors,Logical
Variable allows you to store a value in R. You can then later use this variable’s name to easily access the value that is stored within respective variable.
# Assign the value 50 to x x < 50# Print out the value of the variable x print(x)Output = 50 
Decimals values like 8.22 are called numerics.
Natural numbers like 86 are called integers. Integers are also numerics.
Boolean values (TRUE or FALSE) are called logical.
Text values are called characters.
# Variable my_apple to be 42my_ apple < 42
#Change my_character to be “Wow” my_character < ” Wow “ #Change my_logical to be FALSE my_logical < TRUE

If we are given the variables and we have no idea about the data type then one way to know the same is from class(variable)
Taking the above variables:
# Check class of my_appleclass(my_apple)
“numeric” # Check class of my_character class(my_character) “character” # Check class of my_logical class(my_logical) “logical” 
NOTE: No two data types can be solved,one has to convert it i n to the similar one to get the results.
2.2 Vectors
# Sales of Apple Sale_Apple < c( 2,4,5,6)
# Sale of Orange Sale_Orange < c(4,5,8,7) # Calculate Total Sales Total Sales < c( Sale_Apple+Sale_Orange) print(Total Sales)<(6,9,13,13) # Assign Variables to vectors Sale_Apple, Sale_Orange Days< c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”) # Assign the names of the day to vectors Sale_Apple, Sale_Orange. names(Sale_Apple) < Days names( Sale_Orange) < Days 
2.3 Data Frames
A data frame has the variables of a data set as columns and the observations as rows.
Suppose we have loaded the data of “mtcars”.(https://vincentarelbundock.github.io/Rdatasets/datasets.html)
If you want initial or last observations from the data set we use head() and tail() commands.
head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.
Data frames as we now know can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
2.4 Lists
A list allows you to gather a variety of objects matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
# Vector with numerics from 1 up to 170my_vector < 1:70
# First 50 elements of the builtin data frame mtcars my_df < mtcars[1:50,] # Adapt list() call to give the components names my_list < list(my_vector, my_df) # Print out my_list my_list 
3. Simple Linear Regression
Linear regression analysis is the most widely used of all statistical techniques: it is the study of linear, additive relationships between variables. Let Y denote the “dependent” variable whose values you wish to predict, and let X which is independent.
We use the data to understand regression In R.
#import data in R : read.csv(file.choose(),header=TRUE) screen shot 1
#attach the data: attach(LungCapData)
#correlation between age and LungCap :cor(Age,LungCap): 0.8196
#Run Regression : mod< lm(LungCap ~ Age)
#Ask Summary: summary(mod) screen shot 2
#Scatterplot of Age and LungCap: plot(Age,LungCap,main= “scatterplot”)
abline(mod) screen shot 3
#coefficient : coef(mod) Screenshot4
#confidence Interval : confint(mod) Screen Shot 5
#Change Confidence interval : confint(mod,level=.99) Screen Shot 6
#Building ANOVA TABLE: anova(mod) Screen Shot 7
Assumptions of Linear Model
 Y values can be expressed as a linear function of X variable.
 The error terms are independent.
 Variation of observation is constant around the regression line.
 For given value of X,Y values are normally distributed.
How?
Steps
We have run a regression
plot(mod): we see 4 charts
To get all the 4 plots in single page : par(mfrow=c(1,1))Screen Shot 8
In the first plot : For linearity ,line should fairly flat and for homoskedasticity bubbles should be constant
Second Plot : QQ Plot – QuantileQuantile Plot –Y axis is ordered,observed,standardized residuals,X Axis is ordered theoretical residuals:If Y value or error terms are normally distributed the point lie on the diagonal line.
One thought on “Operations in R”