1.1 Scatter Plot

In my previous tutorials Scatter plot was built to present data points given in the sample without using any packages,here we will discuss about how to perform the same using ggplot2 which make it really simple and easy.

Note : Assume the X variable and Y variable are continuous random Variable.

  • At first install ggplot2 package in R:

#install ggplot2 package




  • Now lets get some scatter plots done -“The Basic One!”
  • I have download the data on diamonds from https://vincentarelbundock.github.io/Rdatasets/datasets.html
  • Next step involves load data in R ,the load library and continue with plotting.



#import data

Data<- read.csv(file.choose(),header=TRUE,stringsAsFactors= “FALSE”)

#load library


#Scatter plot



Scatter Plot

The scatter in the above diagram shows the positive relationship between carat and price.

Hence now include the other variable that accompanies this relationship,lets take clarity of diamonds.

#take color=clarity in aesthetic

ggplot(diamonds,aes(x=carat,y=price,color= clarity))+geom_point()


In the diagram taking in to account “Clarity” among the two variable relationship is presented by colored dots where each color depicts the clarity of diamonds and its relationship between the two X and Y variables,red color shows low clarity and has weak positive relation between two variables which the blue color shows relatively stronger relation.

Now add one more variable to this relationship which include the size of the scatter points equal to the diamond cuts.


#take size of dots-cuts

ggplot(diamonds,aes(x=carat,y=price,color= clarity,size=cut))+geom_point()



Here each point with the diamonds with respective cuts and the diagram shows the relationship between price and carat keeping cuts and clarity in to consideration.

Scatter plot is a layer,so in order to include one other layer say curve that shows the general trend between X and Y variable we use geom_smooth .


Show the line of best fit reminde me with linear model:

#Curve to show general trend

ggplot(diamonds,aes(x=carat,y=price))+geom_point()+geom_smooth(se=FALSE,method = lm)


The line shows the linear relationship between two variables .

Faceting makes the understanding of the relationship taking in to account third variable more precisely .



> ggplot(diamonds,aes(x=carat,y=price))+geom_point()+facet_wrap(~clarity)


1.2 Histogram

Now lets catch histogram here with ggplot2.

Sometimes you need one dimension of the data and observe its distribution,here then we use histogram.



ggplot(diamonds,aes(x=price))+ geom_histogram()



Count shows the frequency in the bin and the histogram shows the distribution of price.

To change the width of the histogram we simply include  bin width layer.


#Histogram width

> ggplot(diamonds,aes(x=price))+ geom_histogram(binwidth = 3000)



Lets take in to account the fill option where histogram shows the clarity of diamonds and its price.


#Histogram fill with clarity

ggplot(diamonds,aes(x=price,fill=clarity))+ geom_histogram()

1.3 Boxplot

The basic method in statistics to compare density is  through boxplot.

Boxplot as I have mention before is the graphical representation of data that shows highest,lowest and the median value.




The middle dark line in the first boxplot shows the median and the box is divided in to 75 percentile and 25 percentile

The dark line in above  boxplot are the outlier that goes beyond the expected values.

In order to get more better picture about the distribution we take log value of price.

#Boxplot taking log values



These are the very basic form of data visualization that helps to maintain the data in great form.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s