1.VISUALIZE YOUR DATA USING PACKAGE GGPLOT2

1.1 Scatter Plot

In my previous tutorials Scatter plot was built to present data points given in the sample without using any packages,here we will discuss about how to perform the same using ggplot2 which make it really simple and easy.

Note : Assume the X variable and Y variable are continuous random Variable.

  • At first install ggplot2 package in R:
 

#install ggplot2 package

install.packages(“ggplot2”)

 

 

  • Now lets get some scatter plots done -“The Basic One!”
  • I have download the data on diamonds from https://vincentarelbundock.github.io/Rdatasets/datasets.html
  • Next step involves load data in R ,the load library and continue with plotting.

 

 

#import data

Data<- read.csv(file.choose(),header=TRUE,stringsAsFactors= “FALSE”)

#load library

library(“ggplot2”)

#Scatter plot

ggplot(diamonds,aes(x=carat,y=price)+geom_point())

 

Scatter Plot

The scatter in the above diagram shows the positive relationship between carat and price.

Hence now include the other variable that accompanies this relationship,lets take clarity of diamonds.

#take color=clarity in aesthetic

ggplot(diamonds,aes(x=carat,y=price,color= clarity))+geom_point()

Rplot.png

In the diagram taking in to account “Clarity” among the two variable relationship is presented by colored dots where each color depicts the clarity of diamonds and its relationship between the two X and Y variables,red color shows low clarity and has weak positive relation between two variables which the blue color shows relatively stronger relation.

Now add one more variable to this relationship which include the size of the scatter points equal to the diamond cuts.

 

#take size of dots-cuts

ggplot(diamonds,aes(x=carat,y=price,color= clarity,size=cut))+geom_point()

 

Rplot01

Here each point with the diamonds with respective cuts and the diagram shows the relationship between price and carat keeping cuts and clarity in to consideration.

Scatter plot is a layer,so in order to include one other layer say curve that shows the general trend between X and Y variable we use geom_smooth .

Rplot03

Show the line of best fit reminde me with linear model:

#Curve to show general trend

ggplot(diamonds,aes(x=carat,y=price))+geom_point()+geom_smooth(se=FALSE,method = lm)

Rplot04.png

The line shows the linear relationship between two variables .

Faceting makes the understanding of the relationship taking in to account third variable more precisely .

 

#Faceting

> ggplot(diamonds,aes(x=carat,y=price))+geom_point()+facet_wrap(~clarity)

Rplot05

1.2 Histogram

Now lets catch histogram here with ggplot2.

Sometimes you need one dimension of the data and observe its distribution,here then we use histogram.

 

#Histogram

ggplot(diamonds,aes(x=price))+ geom_histogram()

 

Rplot06.png

Count shows the frequency in the bin and the histogram shows the distribution of price.

To change the width of the histogram we simply include  bin width layer.

 

#Histogram width

> ggplot(diamonds,aes(x=price))+ geom_histogram(binwidth = 3000)

 

Rplot07.png

Lets take in to account the fill option where histogram shows the clarity of diamonds and its price.

Rplot08.png

#Histogram fill with clarity

ggplot(diamonds,aes(x=price,fill=clarity))+ geom_histogram()

1.3 Boxplot

The basic method in statistics to compare density is  through boxplot.

Boxplot as I have mention before is the graphical representation of data that shows highest,lowest and the median value.

#Boxplot

ggplot(diamonds,aes(x=color,y=price))+geom_boxplot()

Rplot09.png

The middle dark line in the first boxplot shows the median and the box is divided in to 75 percentile and 25 percentile

The dark line in above  boxplot are the outlier that goes beyond the expected values.

In order to get more better picture about the distribution we take log value of price.

#Boxplot taking log values

ggplot(diamonds,aes(x=color,y=price))+geom_boxplot()+scale_y_log10()

Rplot10.png

These are the very basic form of data visualization that helps to maintain the data in great form.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s