2. Operate data in R

Data Types in R

  • Variable
  • Tables

2.1 Variables- Numerics,Character,Factors,Logical

Variable allows you to store a value in R. You can then later use this variable’s name to easily access the value that is stored within respective  variable. 

# Assign the value 50 to x
x <- 50# Print out the value of the variable x
print(x)Output = 50 

 Decimals values like 8.22 are called numerics.

Natural numbers like 86 are called integers. Integers are also numerics.
Boolean values (TRUE or FALSE) are called logical.
Text values are called characters.

#  Variable my_apple to be 42my_ apple <- 42

#Change my_character to be “Wow”

my_character <- ” Wow “

#Change my_logical to be FALSE

my_logical <- TRUE

 

If we are given the variables and we have no idea about the data type then one way to know the same is from class(variable)

Taking the above variables:

# Check class of my_appleclass(my_apple)

“numeric”

# Check class of my_character

class(my_character)

“character”

# Check class of my_logical

class(my_logical)

“logical”

NOTE: No two data types can be solved,one has to convert it i n to the similar one to get the results.

2.2 Vectors

# Sales of Apple Sale_Apple <- c( 2,4,5,6)

# Sale of Orange

 Sale_Orange <- c(4,5,8,7)

# Calculate Total Sales

Total Sales < c(  Sale_Apple+Sale_Orange)

print(Total Sales)<-(6,9,13,13)

# Assign Variables to vectors Sale_Apple, Sale_Orange

Days<- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”)

# Assign the names of the day to vectors Sale_Apple, Sale_Orange.

names(Sale_Apple) <- Days

names( Sale_Orange) <- Days

 

2.3 Data Frames

A data frame has the variables of a data set as columns and the observations as rows.

Suppose we have loaded the data of “mtcars”.(https://vincentarelbundock.github.io/Rdatasets/datasets.html)

If you want initial or last observations from the data set we use head() and tail() commands.

head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.

Data frames as we now know can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.

2.4 Lists

A list allows you to gather a variety of objects -matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.

# Vector with numerics from 1 up to 170my_vector <- 1:70

# First 50 elements of the built-in data frame mtcars

my_df <- mtcars[1:50,]

# Adapt list() call to give the components names

my_list <- list(my_vector, my_df)

 # Print out my_list

my_list

3. Simple Linear Regression

Linear regression analysis is the most widely used of all statistical techniques: it is the study of linea-r, additive relationships between variables.   Let Y denote the “dependent” variable whose values you wish to predict, and let X which is independent.

We use the data to understand regression In R.

#import data in R : read.csv(file.choose(),header=TRUE) screen shot 1

 

#attach the data: attach(LungCapData)

#correlation between age and LungCap :cor(Age,LungCap): 0.8196

#Run Regression : mod<- lm(LungCap ~ Age)

#Ask Summary: summary(mod) screen shot 2

#Scatterplot of Age and LungCap: plot(Age,LungCap,main= “scatterplot”)

abline(mod) screen shot 3

#coefficient : coef(mod) Screenshot4

#confidence Interval : confint(mod) Screen Shot 5

#Change Confidence interval : confint(mod,level=.99) Screen Shot 6

#Building ANOVA TABLE: anova(mod) Screen Shot 7

Assumptions of Linear Model

  • Y values can be expressed as a linear function of X variable.
  • The error terms are independent.
  • Variation of observation is constant around the regression line.
  • For given value of X,Y values are normally distributed.

How?

Steps

We have run a regression

plot(mod): we see 4 charts

To get all the 4 plots in single page : par(mfrow=c(1,1))Screen Shot 8

In the first plot : For linearity ,line should fairly flat and for homoskedasticity bubbles should be constant

Second Plot : QQ Plot – Quantile-Quantile Plot –Y axis is ordered,observed,standardized residuals,X Axis is ordered theoretical residuals:If Y value or error terms are normally distributed the point lie on the diagonal line.

Advertisements

One thought on “Operations in R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s