19**Machine Learning Algorithms With Sklearn**

by Kaustubh M H

0.0 (0 Reviews) Discussions Start

**About this Roadmap**

We have learned about the basic concepts behind some of the Machine Learning algorithms. In this section you will learn how to implement the algorithms using Sci-kit learn.

**Milestone 1 :** Implement Regression Algorithms using Sklearn

**Milestone 2 :** Building Models For Classification Problems In Four Steps

**Milestone 3 :** Using Sklearn For Clustering

Essential Math for Data Science — ‘Why’ and ‘How’

beginnerIntroduction to Python is a resource for beginners who want to learn Python.

beginnerYou can follow this roadmap to get a brief overview about the tools used in ML.

beginnerLearn the basic implementation of Machine Learning Algorithms from Scratch

beginnerAfter completion of this roadmap you will:

- Be able to implement Regression algorithms using built-in sklearn libraries.
- Learn how to implement different classification algorithms in four simple steps.
- Be able to build models for unsupervised machine learning algorithms using sklearn libraries.

Milestone: Implement Regression Algorithms using Sklearn

As you have already studied about Linear Regression I won't go deep into explaining it again. I'll directly start with the execution of the code.

First as usual we are going to import the required libraries and the data set.

`In [ 1 ]:`

```
# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Importing the data set
from sklearn.datasets import load_boston
# Code to plot the graphs without using plt.show()
%matplotlib inline
```

`In [ 2 ]:`

```
boston = load_boston()
print(boston.DESCR)
Boston House Prices dataset
===========================
Notes
------
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive
:Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D. and Rubinfeld, D.L.
This is a copy of UCI ML housing dataset.
<http://archive.ics.uci.edu/ml/datasets/Housing>
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
**References**
- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
- many more! (see <http://archive.ics.uci.edu/ml/datasets/Housing>)
```

We have to first do the data pre-processing before we load it to the algorithm for the model to get trained.

`In [ 3 ]:`

```
features = pd.DataFrame(boston.data, columns=boston.feature_names)
target = pd.DataFrame(boston.target, columns=['target'])
df = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
x = df['RM'].values
y = df['target'].values
```

The data we have considered in 1 Dimensional. To perform matrix multiplication, we need the data to be 2 Dimensional. To convert it to 2 dimensional, we perform the operation below.

`In [ 5 ]:`

```
x = x.reshape(-1,1)
y = y.reshape(-1,1)
```

The MinMaxScaler() function scales the value between 0 and 1. When you scale all the values between 0 and 1, plotting all data on a same plot becomes very simple.

`In [ 6 ]:`

```
scaler = MinMaxScaler()
X = scaler.fit_transform(x)
Y = scaler.fit_transform(Y)
```

After you scale the values, we split the data into training data and test data.

`In [ 7 ]:`

```
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2)
```

Next you initialize the regressor object to train the model.

`In [ 8 ]:`

```
regressor = LinearRegression()
regressor.fit(x_train, y_train)
```

`Out [ 8 ]:`

```
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
```

When you perform, regressor.fit(), the model gets trained. We get the output with the parameters with which you have trained the model.

**copy_X = True** : This means a copy of the X data is taken to train the data. If its False, then the data might be over written

**fit_intercept** : This parameter is again Boolean in nature. If it's true, the intercepts of the model will be calculated. If it's False, the intercepts will not be calculated

**n_jobs** : This specifies how concurrent processes or threads are considered for parallel computing.

**normalize** : This specifies if the data is going to normalized before the model is trained. If the value is True, the data gets normalized or else the data does not get normalized. If the fit_intercept parameter is False, this is not considered at all.

`In [ 9 ]:`

```
y_pred = regressor.predict(x_test)
```

When you execute this command, the output values are predicted based on the model.

`In [ 10 ]:`

```
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressor.predict(x_train), color = 'blue')
plt.xlabel("RM")
plt.ylabel("Target")
plt.title("Linear Regression (Train)")
plt.show()
```

`In [ 11 ]:`

```
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressor.predict(x_train), color = 'blue')
plt.xlabel("RM")
plt.ylabel("Target")
plt.title("Linear Regression (Test)")
plt.show()
```

`In [ 12 ]:`

```
print(mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))
```

Milestone: Implement Regression Algorithms using Sklearn

Since you have understood how Linear Regression method from sklearn works, now you are going to execute the code for multivariate linear regression

`In [ 1 ]:`

```
# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Importing the data set
from sklearn.datasets import load_boston
# Code to plot the graphs without using plt.show()
%matplotlib inline
```

`In [ 2 ]:`

```
boston = load_boston()
print(boston.DESCR)
Boston House Prices dataset
===========================
Notes
------
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive
:Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D. and Rubinfeld, D.L.
This is a copy of UCI ML housing dataset.
<http://archive.ics.uci.edu/ml/datasets/Housing>
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
**References**
- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
- many more! (see <http://archive.ics.uci.edu/ml/datasets/Housing>)
```

`In [ 3 ]:`

```
features = pd.DataFrame(boston.data, columns=boston.feature_names)
target = pd.DataFrame(boston.target, columns=['target'])
df = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
x = df.iloc[:,:-1].values
y = df.iloc[:,-1].values
```

`In [ 5 ]:`

```
y = y.reshape(-1,1)
```

`In [ 6 ]:`

```
scaler = MinMaxScaler()
X = scaler.fit_transform(x)
Y = scaler.fit_transform(Y)
```

`In [ 7 ]:`

```
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2)
```

`In [ 8 ]:`

```
regressor = LinearRegression()
regressor.fit(x_train, y_train)
```

`Out [ 8 ]:`

```
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
```

`In [ 9 ]:`

```
y_pred = regressor.predict(x_test)
```

`In [ 10 ]:`

```
print(mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))
```

Milestone: Implement Regression Algorithms using Sklearn

Now, we are going to work on Polynomial Regression

`In [ 1 ]:`

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import load_boston
%matplotlib inline
```

`In [ 2 ]:`

```
boston = load_boston()
print(boston.DESCR)
Boston House Prices dataset
===========================
Notes
------
Data Set Characteristics:
:Number of Instances: 506
:Number of Attributes: 13 numeric/categorical predictive
:Median Value (attribute 14) is usually the target
:Attribute Information (in order):
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's
:Missing Attribute Values: None
:Creator: Harrison, D. and Rubinfeld, D.L.
This is a copy of UCI ML housing dataset.
<http://archive.ics.uci.edu/ml/datasets/Housing>
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.
The Boston house-price data has been used in many machine learning papers that address regression
problems.
**References**
- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
- many more! (see <http://archive.ics.uci.edu/ml/datasets/Housing>)
```

`In [ 3 ]:`

```
features = pd.DataFrame(boston.data, columns=boston.feature_names)
target = pd.DataFrame(boston.target, columns=['target'])
df = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
x = df['LSTAT'].values
y = df['target'].values
```

`In [ 5 ]:`

```
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
```

`In [ 6 ]:`

```
scaler = MinMaxScaler()
x = scaler.fit_transform(x)
y = scaler.fit_transform(y)
```

`In [ 7 ]:`

```
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
```

`In [ 8 ]:`

```
poly=PolynomialFeatures(degree=3)
# Converting the data in x_train to polynomials of degrees 1, 2 and 3
poly_x=poly.fit_transform(x_train)
# Converting the data in x_test to polynomials of degrees 1, 2 and 3
ploy_x_test = poly.fit_transform(x_test)
```

`In [ 9 ]:`

```
regressor= LinearRegression()
regressor.fit(poly_x, y_train)
```

`Out [ 8 ]:`

```
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
```

`In [ 10 ]:`

```
y_pred = regressor.predict(ploy_x_test)
```

`In [ 11 ]:`

```
plt.scatter(x_train,y_train,color='red')
plt.plot(x_train,regressor.predict(poly.fit_transform(x_train)),'b.')
```

`In [ 12 ]:`

```
plt.plot(x_test, y_test, 'r.', x_test, y_pred, 'b.')
```

`In [ 13 ]:`

```
print(mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))
```

Milestone: Implement Regression Algorithms using Sklearn

Support Vector Regression is not same as Support Vector Machines. As the name itself suggests, you can use SVR for continuous Data. There are some terms that you must know before we start discussing about SVR.

Before you start learning about what SVR is all about, you should know what support vectors are.

To understand what support vectors are, consider the image below:

When you look at this plot, you get to know that you can separate this with data with a straight line. The data gets separated into two different classes. If you observe, these points lie in a two dimensional plane.

If the points were in a 3-D space, you have to separate the data using a plane.

For now, let us consider that the data is on a 2-D plane. You learnt that you can draw a straight line to separate the data into two classes. But there are infinite possible lines that you can draw. These straight lines are called Hyper Planes. How can you choose the best hyper plane?

There are two intuitions that will help you to lead to choosing the best hyper plane.

Confidence in making the best prediction: The margin of the hyper plane is defined as

This equation is similar to the straight line equation.

This margin is called the functional margin. If you consider the functional margin with respect to a training example, (x(i), y(i)), is defined as,

If y(i) = 1, for large functional margin, we want

If y(i) = -1, for a large functional margin we want

This helps you in learning that the functional margin to be large.

Margin:

Choosing a hyperplane which is at a maximum distance from the training points is another intuition. You can formulate this with the help of geometric margin. Geometric margin is defined using:

With these two intuitions, you can get the best hyperplane.

Now coming to the back to understanding what support vectors are, in the diagram below, you see two blue lines.

This blue line is called the support vectors and the yellow line is the hyperplane. The distance between the support vector and the yellow line is called the margin.

Now before we go on to understand what SVR is about, here are a few terms which you should know about.

- Kernel - The function used to map the lower dimensional data into a higher dimensional data.
- Hyper Plane - It is a line that will help you to predict the continuous output value.
- Boundary line - There are two lines other than the Hyper Plane. It creates a margin.
- Support Vectors - These are the data points which are the closest to the boundary. The distance of the points is minimum from the hyperplane.

So what Support Vector Regression is, finding the hyper plane to the continuous data and the support vectors of the hyperplane that gets generated.

Here's the code for generating a hyperplane and the support vectors for the continuous data set.

`In [ 1 ]:`

```
# Importing the required packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
# Importing the dataset
from sklearn.datasets import load_boston
%matplotlib inline
```

`In [ 2 ]:`

```
# Loading the data set to a variable
boston=load_boston()
```

`In [ 3 ]:`

```
# Obtaining the features and the target values. Concatenating the values to get a single data frame.
features=pd.DataFrame(boston.data,columns=boston.feature_names)
target=pd.DataFrame(boston.target,columns=['target'])
data=pd.concat([features,target],axis=1)
```

`In [ 4 ]:`

```
# Loading the data
X=data['RM']
Y = data['target']
```

`In [ 5 ]:`

```
# Converting a 1-D data to 2-D data
X = X.reshape(-1, 1)
Y = Y.reshape(-1, 1)
```

`In [ 6 ]:`

```
# Splitting the data into train data and test data
xtrain,xtest,ytrain,ytest=train_test_split(X,Y,test_size=0.2)
```

`In [ 7 ]:`

```
# Performing the standard scaling of the dataset.
sc_X = StandardScaler()
sc_y = StandardScaler()
x_train = sc_X.fit_transform(xtrain)
x_test = sc_X.fit_transform(xtest)
y_train = sc_y.fit_transform(ytrain)
y_test = sc_y.fit_transform(ytest)
```

`In [ 8 ]:`

```
#Loading the algorithm to train and training the model.
regressor = SVR(kernel = 'rbf')
regressor.fit(x_train, y_train)
```

`In [ 9 ]:`

```
# Predicting the output values for the test data set.
y_pred = regressor.predict(y_test)
y_pred = sc_y.inverse_transform(y_pred)
```

`In [ 10 ]:`

```
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressor.predict(x_train), 'b.')
plt.show()
```

`In [ 11 ]:`

```
X_grid = np.arange(min(x_train), max(x_train), 0.01) # choice of 0.01 instead of 0.1 step because the data is feature scaled
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(x_train, y_train, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.show()
```

`In [ 12 ]:`

```
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_test, regressor.predict(x_test), 'b.')
plt.show()
```

Milestone: Implement Regression Algorithms using Sklearn

Decision trees are classified majorly into two types. They are

- Regression trees
- Classification trees

So as you know, we are going to study about Regression trees in this section. Well, I won't lie to you. Regression tree is a bit complex when you compare it with the classification trees. But I'll help you understand it in a simple manner.

To start with, consider the example data set scattered as you see in the graph below.

Right now, the graph is plotted with two features 1 and 2. The target is the third dimension. If you can imagine, it's sticking out of your screen.

What decision tree does is, it splits the scatterplot into multiple parts. Let's see how the data might get split in our graph.

This is what a decision tree algorithm would do to your data. But you don't know how it happens right. So now, let's see how it happens.

The splits that are formed by the algorithm happens because of a mathematical condition called Information Entropy. To vaguely explain what Information Entropy is, it checks the amount of information that is available from one group of points and if that group of points is adding some value to the other set of points. The algorithm stops grouping when there is a certain minimum value for the information that needs to be added.

Each split is called a leaf. Together they are called leaves.

We can go on to split our data into different groups, but if they don't add any information, then its useless to split them into more groups.

Here information is the value of the features associated with the output value y.

The set of splits that are formed at the end of the algorithm are Terminal Leaves.

Now, let's see how the splits happen and how the decision tree is drawn.

Here X1 is the value of feature 1 along the X-axis. Y1 is the first feature along Y-axis and Y2 is the second feature along Y-axis. Y1 < Y2.

The decision tree regression algorithm calculates the average of all the data points' values that falls in each leaf. The main goal of Decision Tree is to increase the information value so that it becomes easy to predict the output value which is basically the target value.

Now, let's get into the code of each of these:

`In [ 1 ]:`

```
# Importing the required packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Importing the data set
from sklearn.datasets import load_boston
%matplotlib inline
```

`In [ 2 ]:`

```
# Loading the data into a variable
boston=load_boston()
#Obtaining the feature and target value. Concatenating both of them to get a single dataframe
features=pd.DataFrame(boston.data,columns=boston.feature_names)
target=pd.DataFrame(boston.target,columns=['target'])
data=pd.concat([features,target],axis=1)
# Choosing the required Features
X=data['RM']
Y = data['target']
```

`In [ 3 ]:`

```
# Reshaping the data points from 1-D to 2-D
X = X.reshape(-1, 1)
y = Y.reshape(-1, 1)
```

`In [ 4 ]:`

```
# Splitting the values into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
```

`In [ 5 ]:`

```
# Scaling the data values to the required style
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)
```

`In [ 7 ]:`

```
# Training the model
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X_train, y_train)
```

`Out [ 7 ]:`

```
DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=0, splitter='best')
```

`In [ 8 ]:`

```
y_pred = regressor.predict(X_test)
```

`In [ 9 ]:`

```
X_grid = np.arange(min(X_train), max(X_train), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.show()
```

```
X_grid = np.arange(min(X_test), max(X_test), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X_test, y_pred, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.show()
```

Milestone: Implement Regression Algorithms using Sklearn

Random forest Regression is similar to Decision tree Regression but we are going to choose K-data values. Here are the steps that you have to follow to get the output using Random Forest regression

**Step 1:** You have to pick random K data points from the training data set

**Step 2:** Next, you have to build a decision tree associated to these K points

**Step 3:** You have next choose N tree of the trees you want to build, and you have to repeat steps 1 & 2

**Step 4:** For a new data point, you have to make sure that each one of the N tree predicts the value for the new data point. Also it should assign the new data point the average value across all of the predicted Y values from the N trees.

The Random Forest Regression is also called as the Ensemble model. The advantage of this model is that, the output value of a new data depends on all the trees that are present in our model. Suppose the new data point affects one tree, it cannot affect all the trees in the same magnitude. Therefore the model will be pretty stable.

Now that you have understood how the Random Forest works, let's get into the code part of it.

`In [ 1 ]:`

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_boston
%matplotlib inline
```

`In [ 2 ]:`

```
boston=load_boston()
features=pd.DataFrame(boston.data,columns=boston.feature_names)
target=pd.DataFrame(boston.target,columns=['target'])
data=pd.concat([features,target],axis=1)
X=data['RM']
Y = data['target']
```

`In [ 3 ]:`

```
X = X.reshape(-1, 1)
y = Y.reshape(-1, 1)
```

`In [ 4 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
```

`In [ 5 ]:`

```
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)
```

`In [ 6 ]:`

```
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X_train, y_train)
```

`Out [ 6 ]:`

```
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=0, verbose=0, warm_start=False)
```

`In [ 7 ]:`

```
y_pred = regressor.predict(y_test)
```

`In [ 8 ]:`

```
X_grid = np.arange(min(X_train), max(X_train), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
```

`In [ 9 ]:`

```
X_grid = np.arange(min(X_test), max(X_test), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
```

Milestone: Building Models For Classification Problems In Four Steps

This section deals with understanding how to use API's for logistic regression.

import numpy as np import pandas as pd import matplotlib.pyplot as plt

```
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from matplotlib.colors import ListedColormap
import seaborn as sns
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
print(cancer.DESCR)
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`In [ 5 ]:`

```
x = np.array(data['worst concave points'])
y = np.array(data['TARGET'])
```

`In [ 6 ]:`

```
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
```

`In [ 7 ]:`

```
plt.plot(x, y, 'r.')
```

`In [ 8 ]:`

```
scaler = MinMaxScaler()
x = scaler.fit_transform(x)
y = scaler.fit_transform(y)
```

`In [ 9 ]:`

```
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
```

`In [ 10 ]:`

```
classifier = LogisticRegression(random_state = 0)
classifier.fit(x_train, y_train)
```

`Out [ 10 ]:`

```
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=0, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
```

`In [ 11 ]:`

```
y_pred = classifier.predict(x_test)
```

`In [ 12 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 13 ]:`

```
plt.plot(x_train, y_train, 'r.', x_test, y_pred, 'b.')
```

`In [ 14 ]:`

```
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Blues_r');
```

Milestone: Building Models For Classification Problems In Four Steps

You already know the theory behind K-NN. So now, let's get into the code part of KNN

`In [ 1 ]:`

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`Out [ 4 ]:`

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
```

`In [ 6 ]:`

```
x = np.column_stack((x1,x2))
```

`In [ 7 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 8 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 9 ]:`

```
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
```

`Out [ 9 ]:`

KNeighborsClassifier(algorithm='auto', leaf_size=30,

metric='minkowski', metric_params=None, n_jobs=1,

n_neighbors=5, p=2, weights='uniform')

`In [ 10 ]:`

```
y_pred = classifier.predict(X_test)
```

`In [ 11 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 12 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Training set)')
plt.legend()
```

`In [ 13 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.legend()
```

Milestone: Building Models For Classification Problems In Four Steps

You have already gone through what support vectors are in the Support Vector Regression. You can go back and take a quick look at it to refresh your memory.

Support vector machines work only on categorical data. The hyperplane is drawn to separate the classes. Here, the hyperplane will be a straight line.

You can take a look at the code here.

import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC from matplotlib.colors import ListedColormap from sklearn.metrics import confusion_matrix

```
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`Out [ 4 ]:`

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 7 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 8 ]:`

```
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
```

`Out [ 8 ]:`

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',

max_iter=-1, probability=False, random_state=0, shrinking=True,

tol=0.001, verbose=False)

`In [ 9 ]:`

```
y_pred = classifier.predict(X_test)
```

`In [ 10 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 11 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Training set)')
plt.legend()
```

`In [ 12 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Test set)')
plt.legend()
```

Milestone: Building Models For Classification Problems In Four Steps

The only difference between SVM and Kernel SVM is the type of kernel that is used. You can check out the different types of kernels here:

https://data-flair.training/blogs/svm-kernel-functions/

In this example we are using RBF kernel.

`In [ 1 ]:`

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`Out [ 4 ]:`

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 7 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 8 ]:`

```
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)
```

`Out [ 8 ]:`

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',

max_iter=-1, probability=False, random_state=0, shrinking=True,

tol=0.001, verbose=False)

`In [ 9 ]:`

```
y_pred = classifier.predict(X_test)
```

`In [ 10 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 11 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Training set)')
plt.legend()
```

`In [ 12 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('black', 'blue'))(i), label = j)
plt.title('SVM (Test set)')
plt.legend()
```

Milestone: Building Models For Classification Problems In Four Steps

Naive Bayes is another popular supervised classification algorithm which is based on a theorem called Bayes theorem. This algorithm is called “Naive” because it makes a naive assumption that each feature is independent of other features which is not true in real life. As for the “Bayes” part, it refers to the statistician and philosopher, Thomas Bayes and the theorem is named after him. Before we understand what Naive Bayes is, lets take a quick look at what Bayes theorem is.

Bayes Theorem helps us to find the probability of a hypothesis given our prior knowledge. It finds the probability of an event occurring given the probability of another event that has already occurred. The equation for Bayes Theorem is show below.

Where,

**P(A|B)**is the probability of hypothesis A given the data B. This is called the**posterior probability.****P(B|A)**is the probability of data B given that the hypothesis A was true.**P(A)**is the probability of hypothesis A being true (regardless of the data). This is called the**prior probability of A**.**P(B)**is the probability of the data (regardless of the hypothesis).

If you are not sure about what P(A|B) or P(B|A) is, it is the the conditional probability having the formula

Let’s think about a simple example to make sure we clearly understand this concept.

We will stick to the breast cancer dataset. There are two possible outcomes here: Malignant(the person has cancer) and benign(the person does not have cancer). For the sake of this example, let us assume that in the entire population of people, only 2% of them have cancer and lets say the lab test return a correct positive 96% of the time and a correct negative 97% of the time. This can be written in terms of probabilities as:

- P(cancer)=0.02
- P(not cancer)=0.98
- P(Malignant|cancer)=0.96 P(Malignant| not cancer)=0.04
- P(Benign|not cancer)=0.97 P(Benign|cancer)=0.03

Now, given a new patient whose lab results are positive, should he be diagnosed of having cancer or not? There are two possibilites here,

- P(Malignant|cancer)P(cancer)=0.96*0.02= 0.019
- P(Malignant| not cancer)
*P(not cancer)=0.04*0.98= 0.039

which means, the probability of the person not having cancer is higher than the probability of this person having cancer.

Essentially, Naive Bayes calculates the probabilities for all input features (in our case, would be the features of the cell that contributes to cancer). Then, it selects the outcome with highest probability (malignant or benign).

Now that you have understood how it works, take a look at the code for implementing this algorithm:

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
#Importing dataset
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 7 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 7 ]:`

```
GaussianNB(priors=None)
```

`In [ 8 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 9 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Training set)')
plt.legend()
```

`In [ 10 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Test set)')
plt.legend()
```

Milestone: Building Models For Classification Problems In Four Steps

We already saw the working of Decision tree regression. Its now time to learn what Decision tree classification is.

Suppose we have following plot for two classes represented by black circle and blue squares. Can you draw a single line that seperates the two classes?

Probably not. We need more than one line to seperate the two classes.

We need two lines here one separating according to threshold value of x and other for threshold value of y. This is exactly what decision trees try to do.

Decision tree classifiers try to repeatedly divide the working area into subplots by drawing lines to seperate the different classes.

So when does it terminate?

- Either it has divided into classes that are pure (only containing members of single class)
- Some criteria of classifier attributes are met.

The working of decision tree classification is based on something called Information Gain. The information gain is based on the decrease in entropy after a dataset is split on an attribute. Based on the values obtained after calculating the information gain, we can construct the decision tree. The steps to do this are

**Step 1**: Calculate entropy of the target.

**Step 2**: Split the dataset based on different attributes. The entropy for each attribute is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.

**Step 3**: Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.

Here is the code to implement the decision tree classifiers.

`In [ 1 ]:`

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 7 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 8 ]:`

```
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
```

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None, max_features=None, max_leaf_nodes=None,min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, presort=False, random_state=0,splitter='best')

`In [ 9 ]:`

```
y_pred = classifier.predict(X_test)
```

`In [ 10 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 11 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Decision Tree Classification (Training set)')
plt.legend()
```

`In [ 12 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Decision Tree Classification (Test set)')
plt.legend()
```

Milestone: Building Models For Classification Problems In Four Steps

We already saw the working of Random Forest algorithm on regression type of problems. The only difference here is that the target values are categorical. The working of the algorithm remains the same.

Now, lets get into the code part using sklearn.

`In [ 1 ]:`

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer
%matplotlib inline
```

`In [ 2 ]:`

```
cancer = load_breast_cancer()
```

`In [ 3 ]:`

```
features = pd.DataFrame(cancer.data, columns=cancer.feature_names)
target = pd.DataFrame(cancer.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`In [ 5 ]:`

```
x1 = np.array(data['worst concave points'])
x2 = np.array(data['worst perimeter'])
y = np.array(data['TARGET'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
```

`In [ 7 ]:`

```
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```

`In [ 8 ]:`

```
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
```

`Out [ 8 ]:`

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',

max_depth=None, max_features='auto', max_leaf_nodes=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,

oob_score=False, random_state=0, verbose=0, warm_start=False)

`In [ 9 ]:`

```
y_pred = classifier.predict(X_test)
```

`In [ 10 ]:`

```
cm = confusion_matrix(y_test, y_pred)
```

`In [ 11 ]:`

```
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.legend()
```

`In [ 12 ]:`

```
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.5, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.legend()
```

Milestone: Using Sklearn For Clustering

You already know the theory behind K-Means Clustering. So now let's get back to the code directly.

`In [ 1 ]:`

```
# Importing packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from matplotlib.colors import ListedColormap
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
#Importing dataset
from sklearn.datasets import load_wine
%matplotlib inline
```

`In [ 2 ]:`

```
wine = load_wine()
```

`In [ 3 ]:`

```
features = pd.DataFrame(wine.data, columns=wine.feature_names)
target = pd.DataFrame(wine.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`Out [ 5 ]:`

`In [ 5 ]:`

```
x1 = np.array(data['flavanoids'])
x2 = np.array(data['od280/od315_of_diluted_wines'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
sc = StandardScaler()
X = sc.fit_transform(x)
```

`In [ 7 ]:`

```
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
```

`In [ 8 ]:`

```
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
```

`In [ 9 ]:`

```
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
```

Milestone: Using Sklearn For Clustering

In the Hierarchial clustering algorithm, the dataset is not partitioned into clusters in a single step. Instead it involves multiple steps which run from a single cluster containing all the data points to n clusters containing single data point.

This algorithm is further classified into Divisive and Agglomerative Methods.

**Divisive Method**: This is also known as top-down clustering method. Here, it assigns all the datapoints of the dataset into a single cluster. This single cluster is then divided into two clusters which have least similarities. Then the same method is applied recursively on both the clusters until we get the cluster of each data point.

**Agglomerative Method:** This is also called Bottom-up clustering. n data points are assigned to n different clusters. Then, the most similar clusters are joined together to get a single cluster.

Here is the code to implement the agglomerative clustering.

`In [ 1 ]:`

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
#Importing dataset
from sklearn.datasets import load_wine
%matplotlib inline
```

`In [ 2 ]:`

```
wine = load_wine()
```

`In [ 3 ]:`

```
features = pd.DataFrame(wine.data, columns=wine.feature_names)
target = pd.DataFrame(wine.target, columns=["TARGET"])
data = pd.concat([features, target], axis=1)
```

`In [ 4 ]:`

```
a = data.corr('pearson')
abs(a.loc['TARGET']).sort_values(ascending=False)
```

`In [ 5 ]:`

```
x1 = np.array(data['flavanoids'])
x2 = np.array(data['od280/od315_of_diluted_wines'])
x = np.column_stack((x1,x2))
```

`In [ 6 ]:`

```
sc = StandardScaler()
X = sc.fit_transform(x)
```

`In [ 7 ]:`

```
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
```

`In [ 8 ]:`

```
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)
```

`In [ 9 ]:`

```
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Wine Clusters')
plt.legend()
```