Python-II

Better decision lies at fast stats and data analysis and data analysis lies at the intersection of programming,statistics and business analysis . To understand the data in systematic and scientific way various tools of data comes in to being.

One such tool is Python. To understand Python there are some programs and packages that is needed to be installed first.

  1. Download and Install Anaconda from https://www.continuum.io/downloads.
  2. Download and Install the Jupyter Notebook Interfacehttp://jupyter.readthedocs.org/en/latest/install.html
  3. We can use pip or easy install to install packages. We can browse Python packages by topic at https://pypi.python.org/pypi?%3Aaction=browse .
  4. Install package pandas from Anaconda as “conda install pandas”.
  5. Install pandasql , seaborn package and ggplot, SQLAlchemy from anaconda.

 

Install Packages from within Jupyter Notebook.

Steps:

  1. Open Jupyter notebook.Click on new folder and choose kernel Python Root.
  2. Import package: import pandas as pd.
  3. Import Data(csv file) : In case the file is stored locally we can use the os python library by In [2]: import os as os.
  4. Read Data : variable=pd.read_csv(“filename”,header=None,here we have taken the data from URL directly so the given command is 

    shooting=pd.read_csv(“Shooting.csv”,encoding=’cp1252′) 

    In order to get full details of the data set we use the following command in Python

    shooting.info()

    Output:

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 1512 entries, 0 to 1511
    Data columns (total 14 columns):
    id                         1512 non-null int64
    name                       1512 non-null object
    date                       1512 non-null object
    manner_of_death            1512 non-null object
    armed                      1511 non-null object
    age                        1479 non-null float64
    gender                     1512 non-null object
    race                       1434 non-null object
    city                       1512 non-null object
    state                      1512 non-null object
    signs_of_mental_illness    1512 non-null bool
    threat_level               1512 non-null object
    flee                       1498 non-null object
    body_camera                1512 non-null bool
    dtypes: bool(2), float64(1), int64(1), object(10)
    memory usage: 144.8+ KB

    Dropping a particular variable : shooting2=shooting.drop(‘Unnamed: 0’, 1).

print(shooting2)

Output:

        id                              name       date   manner_of_death  \
0        3                        Tim Elliot   1/2/2015              shot   
1        4                  Lewis Lee Lembke   1/2/2015              shot   
2        5                John Paul Quintero   1/3/2015  shot and Tasered   
3        8                   Matthew Hoffman   1/4/2015              shot   
4        9                 Michael Rodriguez   1/4/2015              shot   
5       11                 Kenneth Joe Brown   1/4/2015              shot   
6       13               Kenneth Arnold Buck   1/5/2015              shot   
7       15                     Brock Nichols   1/6/2015              shot   
8       16                     Autumn Steele   1/6/2015              shot   
9       17                   Leslie Sapp III   1/6/2015              shot   
10      19                    Patrick Wetter   1/6/2015  shot and Tasered   
11      21                         Ron Sneed   1/7/2015              shot   
12      22    Hashim Hanif Ibn Abdul-Rasheed   1/7/2015              shot   
13      25            Nicholas Ryan Brickman   1/7/2015              shot   
14      27  Omarr Julian Maximillian Jackson   1/7/2015              shot   
15      29                     Loren Simpson   1/8/2015              shot   
16      32               James Dudley Barker   1/8/2015              shot   
17      36               Artago Damon Howard   1/8/2015              shot   
18      37                      Thomas Hamby   1/8/2015              shot   
19      38                     Jimmy Foreman   1/9/2015              shot   
20     325                     Andy Martinez   1/9/2015              shot   
21      42                       Tommy Smith  1/11/2015              shot   
22      43                     Brian Barbosa  1/11/2015              shot   
23      45                 Salvador Figueroa  1/11/2015  shot and Tasered   
24      46               John Edward O'Keefe  1/13/2015              shot   
25      48                 Richard McClendon  1/13/2015              shot   
26      49                     Marcus Golden  1/14/2015              shot   
27      50                    Michael Goebel  1/14/2015              shot   
28      51                      Mario Jordan  1/14/2015              shot   
29      52                  Talbot Schroeder  1/14/2015              shot

Ranging

Input : shooting4= shooting[0:12]

print (shooting4)

To get full statistic of the data: shooting.age.describe()

Output :

count    1479.000000
mean       36.379310
std        12.730798
min         6.000000
25%              NaN
50%              NaN
75%              NaN
max        86.000000
Name: age, dtype: float64

To get correlation between variables

  id age signs_of_mental_illness body_camera
id 1.000 -0.035646 -0.062726 0.079815
age -.0356 1 0.107638 -0.005177
signs_of_mental_illness -.0627 .1076 1 .022973
body_camera .0798 -.00517 .0229 1

 

Using SQL- Python does have the pandasql package

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())

pysqldf(“SELECT * FROM shooting2 LIMIT 5 ; “).

Ranging using pysqldf

pysqldf(“SELECT * FROM shooting2 WHERE age>41;”)

For Data Visualization we use the excellent seaborn package fro0m http://stanford.edu/~mwaskom/software/seaborn/index.html. Histograms , Boxplots ScatterPlots and Jointplots are very easily plotted using seaborn.

Making displot

Input : import seaborn as sns

import matplotlib.pyplot as plt

ax = sns.boxplot(x=”gender”, y=”age”, data=shooting)

Making boxplot

Input : ax = sns.boxplot(x=”gender”, y=”age”, data=shooting)

Making Joinplot

Input : sns.jointplot(‘age’,’gender’,data=shooting)

Making factor plot

Input : sns.factorplot(x=”gender”, y=”age”,
col=”cut”, data=shooting, kind=”box”, size=4, aspect=.5);

For Data Visualization, I can also use the ggplot package created by Yhat:

p = ggplot(aes(x=’age’, y=’gender’,color=”caste”), data=shooting)
p + geom_point().

 Regression Models- A widely used data science technique for business, I can also use the statsmodel package.

Input:import statsmodels.formula.api as sm.

boston=pd.read_csv(“http://vincentarelbundock.github.io/Rdatasets/csv/MASS/Boston.csv&#8221;)

boston =boston.drop(‘Unnamed: 0’, 1).

boston.head()

result = sm.ols(formula=”medv ~ crim + zn + nox + ptratio + black + rm “, data=boston).fit()
result.summary()

Dep. Variable: medv R-squared: 0.631
Model: OLS Adj. R-squared: 0.626
Method: Least Squares F-statistic: 142.0
Date: Tue, 19 Jul 2016 Prob (F-statistic): 1.49e-104
Time: 03:04:33 Log-Likelihood: -1588.2
No. Observations: 506 AIC: 3190.
Df Residuals: 499 BIC: 3220.
Df Model: 6
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -0.3594 4.863 -0.074 0.941 -9.915 9.196
crim -0.0991 0.034 -2.890 0.004 -0.167 -0.032
zn -0.0064 0.014 -0.470 0.638 -0.033 0.020
nox -10.8653 2.865 -3.793 0.000 -16.494 -5.237
ptratio -1.0519 0.135 -7.796 0.000 -1.317 -0.787
black 0.0137 0.003 4.453 0.000 0.008 0.020
rm 6.9796 0.396 17.612 0.000 6.201 7.758
Omnibus: 298.859 Durbin-Watson: 0.808
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3305.426
Skew: 2.385 Prob(JB): 0.00
Kurtosis: 14.577 Cond. No. 7.66e+03

Get the results:

result.params

Intercept    -0.359432
crim         -0.099122
zn           -0.006364
nox         -10.865295
ptratio      -1.051937
black         0.013737
rm            6.979587
dtype: float64

 

Advertisements

One thought on “Python-II

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s