121**Introduction to Neural Networks and Deep Learning**

by Vidyadhar Sharma

0.0 (0 Reviews) Discussions Start

**About this Roadmap**

Neural Network is one of the major research fields in today's world. Majority of the project implementation, research, building new systems to perform some application are all done with the help of neural networks. Follow this Roadmap and understand how a neural network works and build your own projects!

**Milestone 1 :** Getting Started With Neural Networks

**Milestone 2 :** Building a Neural Network From Scratch

**Milestone 3 :** Introduction to Keras

**Milestone 4 :** Build a Optical Character Recognition Project Using Keras

Essential Math for Data Science — ‘Why’ and ‘How’

beginnerIntroduction to Python is a resource for beginners who want to learn Python.

beginnerYou can follow this roadmap to get a brief overview about the tools used in ML.

beginnerAfter completing this roadmap you will :

- Understand what neural networks mean.
- Be able to implement Forward Propogation and Backward Propogation algorithms.
- Build projects using Neural Networks from scratch.
- Be introduced to Keras and its working.
- Be able to build projects using Keras library.

Milestone: Getting Started With Neural Networks

Neural Network is one of the major research fields in today's world. Majority of the project implementation, research, building new systems to perform some application are all done with the help of neural networks.

You might have heard this name - neural networks many a time, but can you give an explanation of what neural network is all about?

I'll help you out there.

Neural Network is a set of neurons that work together to produce us the required output.

Eh! This definition is somewhat not a good one. So before I define what neural network is, you have to first understand how neural networks came into existence.

Somewhere in the late 1980s three people, David Rumelhart, Geoffrey Hinton, and Ronald Williams published a paper on ”Learning representations by back-propagating errors,” in which they described “a new learning procedure, back-propagation, for networks of neuron-like units.”

This was then called a perceptron model. Because of this research, many people got to know how to use neural networks.

The Neural network was deduced from the human brain. If you remember the biology you had studied in your 10th grade, you must be remembering that our brains contain millions and millions of neurons.

Each neuron is connected to another neuron with the help of Synaptic terminals. Our brain communicates with the nervous system with the help of the signals that are sent from the neurons.

The neural network was designed keeping this neural system in mind. To give you an example of how a neural network looks like, here's an image.

Here each round circle is called a neuron/node. The neural networks contain different layers. The layer where the data is fed in is called the input layer, the node where you get the output is called the output layer. There is another layer that is present in the neural network. This layer is called the hidden layer.

In a given neural network, there can be any number of hidden layers.

So how does a neural network work?

Each neuron holds a number based on the input that is given. This number is between the range of 0 and 1. This number is called the activation number.

Now, you might be thinking, what is this activation function? Don't worry I'll explain to you about that also. But first, let's cover a bit more of how the neural network works.

Most of the times, a neural network is used to get outputs that are classification based. The output that is obtained from the neural network depends on this activation number. One more important thing to remember is that each neuron has got two states. "On" state and "Off "state. The state of the neuron is decided by its activation number.

Well, you guessed it right, if the activation number is high, that means that particular neuron is in on state. If the activation number is low, that means, the neuron is in the off state.

Based on the input you give, each neuron switches on and switches off.

The human neural network works with the help of synapses. So, even for our neural network to communicate with each other, they need a connection right?

In the image, you see that each neuron from one layer is connected to all the other neurons in the immediate next layer and it continues till the output layer. The neural network plays the passing the ball game to get the output. 😂😂

These lines which connect the neurons are **weights**.

I know. A lot of information to digest. Don't worry, you'll catch up with the terminologies in no time.

So going ahead, here's a gist of how the neural network works.

You give the input data to the neural network at the input layer. This data is processed by the hidden layers and then you get the output at the hidden layer.

Isn't it simple? It is right! Now, let's get into the interesting part of neural networks, which is the math part.

It's going to be even more fun now!

Milestone: Getting Started With Neural Networks

Neural networks have two types of data transfer happening. One is the forward movement of data which is called the forward propagation. The other is the feedback that is given to the hidden layers. This is called backward propagation.

Let's analyse both of them individually. Well we can't just go ahead and analyse them right, we need an example to understand it clearly.

So for now, let us consider a simple neural network with an input layer with 3 nodes, a hidden layer with 3 nodes and an output layer with 1 node.

The input nodes are denoted with x1, x2 and x3. The hidden nodes are denoted with a1, a2 and a3. The output node is denoted with h(z).

If you remember the steps we followed to train a linear regression model, we had to define a hypothesis equation, an error equation and gradient equation.

Here also, you have to define the hypothesis function, the error function and the gradient function.

The major difference between linear regression and neural network is that,

For linear regression to work, it requires linear data set. If the data is non-linear, linear regression will not work. At that time, neural network comes to your save.

Now that we have an example neural network, let's see how the model gets trained using a neural network.

To get the hypothesis function equation, first let us take a simpler neural network with just input layer and the output layer as shown in the image below.

The activation number of the output node is given by the function called the activation function. The activation function usually used is the sigmoid function. As you know the definition the sigmoid function is,

But the value of the activation number depends on the value of the weights. So, the value of the activation function becomes,

So, with these, you can define the hypothesis function as,

If we consider the the first example, well I've put the same example below again.

So in this example, the activation number of the hidden layer nodes become,

So the output value becomes

So this equation is the forward propagation equation. Forward Propagation is also called as the Feed Forward equation.

There are different feed forward equations also.

Sigmoid function - You worked with it just now.

Tanh function - Tanh function is defined as:

ReLU function - ReLU function is defined as:

Leaky ReLU function - Leaky ReLU function is defined as:

You can define the error function using the cross-entropy function which is defined as,

You have to consider this equation because of the same reason you used this function in logistic regression. If you don't remember the explanation, I'd suggest you, go through that explanation once, before you go ahead.

Milestone: Getting Started With Neural Networks

Once the feed forward gets completed, you get an output value. What the neural network does is that, it checks this value with the actual output values. That way, it gets to know the error value. It sends a feedback signal to the hidden layers.

Based on the feedback, the activation value of the hidden nodes changes. The output of a neural network depends on the value of the weights. So, based on the cycle of the feedback and the feed forward training, the model gets trained.

The feedback process is called the back propagation. The back propagation happens according to this equation.

In back propagation, you have to calculate the partial derivative of the error function with respect to the weights. When you simplify the derivative, you get the equation on the right-hand side. So, when you want to do the derivative, first make sure the error function is simplified to the maximum extent. That means you have to substitute the values of z in the equation.

This completes the explanation of the neural networks.

Now, let us get into implementing projects using neural networks.

Milestone: Building a Neural Network From Scratch

Since this is a small project, we will get started with the code for logic gates.

`In [ 1 ]:`

```
# Import all the needed packages
import numpy as np
import pandas as pd
```

`In [ 2 ]:`

```
# The input data is set by you only because there is no data set to give such small input data
input_data=np.array([[0,0,0],
[0,0,1],
[0,1,0],
[0,1,1],
[1,1,0],
[1,0,0]])
# The respective outputs for OR gate
output_labels= np.array([[0],
[1],
[1],
[1],
[1],
[1]])
```

`In [ 3 ]:`

```
# This function is to define the activation function
def activate(x):
return 1/(1+np.exp(-x))
```

`In [ 4 ]:`

```
# This function defines the derivative of the activation function
def transfer_derivative(output):
return output*(1-output)
```

`In [ 5 ]:`

```
# We have to initialise the intial weight values.
np.random.seed(4)
w0=np.random.random((3,4))-1
w1=np.random.random((4,1))-1
```

`In [ 6 ]:`

```
# This function defines the feed forward.
def feed_forward(input_data):
# Initialize 3 layers.
# Layer 0 - Input Layer
layer0=input_data
# Layer 1 - Hidden Layer
layer1=activate(np.dot(layer0,w0))
# Layer 2 - Output Layer
layer2=activate(np.dot(layer1,w1))
return layer0,layer1,layer2
```

`In [ 7 ]:`

```
# Function to calculate the back propagation value
def backpropogate(j,layer0,layer1,layer2,w1,w0):
# Calculating the error in the output value and the obtained output value
l2_error=output_labels-layer2
if(j%1000)==0:
print("error:"+ str(np.mean(np.abs(l2_error))))
# Calculating the feedback value to correct the output value in layer 2
l2_grad=l2_error*transfer_derivative(layer2)
l1_error=l2_grad.dot(w1.T)
l1_grad=l1_error*transfer_derivative(layer1)
# Calculating the value of the weights
w1 += layer1.T.dot(l2_grad)
w0 += layer0.T.dot(l1_grad)
```

`In [ 8 ]:`

```
for i in range(10000):
layer0,layer1,layer2=feed_forward(input_data)
backpropogate(i,layer0,layer1,layer2,w1,w0)
```

`In [ 9 ]:`

```
# Predicting the value for a new sameple of data
layer0,layer1,layer2=feed_forward([[0,0,0]])
```

`In [ 10 ]:`

`print(layer0, layer1, layer2)`

Milestone: Building a Neural Network From Scratch

Now, we are going to implement neural networks on a data set. In this we are going to classify wine into three different categories. So let's start with importing the required packages.

`In [ 1 ]:`

```
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tqdm import tqdm_notebook
import matplotlib.pyplot as plt
%matplotlib inline
```

`In [ 2 ]:`

```
# All Activation Functions and their Transfer Derivatives
# 1. Sigmoid / Logistic Function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def dsigmoid(x):
return x * (1-x)
# 2. Rectified Linear Unit Function
def relu(x):
return abs(x) * (x > 0)
def drelu(x):
return 1. * (x > 0.)
# 3. Leaky-Relu Functions
def lrelu(x):
return np.where(x > 0., x, x * 0.01)
def dlrelu(x):
dx = np.ones_like(x)
dx[x < 0.] = 0.01
return dx
# 4. Hyperbolic Tan Function
def tanh(x):
return np.tanh(x)
def dtanh(x):
return 1.0 - (np.power(np.tanh(x),2))
```

`In [ 3 ]:`

```
def feed_forward(data_in, w0,w1,w2, w3, b0,b1,b2,b3):
'''
The Feed-forward considers 5 layers including input and output layer.
The output layer/neuron is a classification node.
returns: state of each layer
'''
layer0 = data_in
layer1 = tanh(np.dot(layer0, w0)+b0)
layer2 = tanh(np.dot(layer1, w1)+b1)
layer3 = tanh(np.dot(layer2, w2)+b2)
layer4 = sigmoid(np.dot(layer3, w3)+b3)
return layer0, layer1, layer2, layer3, layer4
```

`In [ 4 ]:`

```
def backpropogate(i, layer0, layer1, layer2, layer3, layer4, actual_y, w0,w1,w2,w3,b0,b1,b2,b3, learning_rate):
'''
This backpropogate is only slightly different from a regular classifier
in ways in which the output layer gradient is calculated.
Since the output layer is not a function of any activation function,
the delta doesn't need to be multiplied with the transfer derivative of the
output layer.
The rest is all the same.
returns: weights and bias matrices
'''
l4_error = layer4 - actual_y
l4_delta = l4_error * dsigmoid(layer4)
dh4 = np.dot(layer3.T, l4_delta)
l3_error = l4_delta.dot(w3.T)
l3_delta = l3_error * dtanh(layer3)
dh3 = np.dot(layer2.T, l3_delta)
l2_error = l3_delta.dot(w2.T)
l2_delta = l2_error * dtanh(layer2)
dh2 = np.dot(layer1.T, l2_delta)
l1_error = l2_delta.dot(w1.T)
l1_delta = l1_error * dtanh(layer1)
dh1 = np.dot(layer0.T, l1_delta)
w3 = w3 - (learning_rate * dh4)
w2 = w2 - (learning_rate * dh3)
w1 = w1 - (learning_rate * dh2)
w0 = w0 - (learning_rate * dh1)
b3 = b3 - (learning_rate * np.mean(l4_delta))
b2 = b2 - (learning_rate * np.mean(l3_delta))
b1 = b1 - (learning_rate * np.mean(l2_delta))
b0 = b0 - (learning_rate * np.mean(l1_delta))
if i%10==0 and (i!=0):
loss = np.mean(np.power(layer4-actual_y, 2))
loss_curve.append(loss)
iters.append(int(i))
if i%100 == 0:
print("\\n", int(i), loss)
return w0, w1,w2,w3, b0,b1,b2,b3
```

`In [ 5 ]:`

```
def accuracy(testx, testy):
correct = 0
layer0, layer1, layer2, layer3, layer4 = feed_forward(testx,w0, w1,w2,w3, b0,b1,b2,b3)
for i in range(len(testx)):
if np.argmax(layer4[i]) == np.argmax(testy[i]):
correct +=1
return f"Accuracy: {correct*100/len(testy)}%"
```

`In [ 6 ]:`

```
# Importing the data set.
from sklearn.datasets import load_wine
wine = load_wine()
features = wine.data
target = wine.target
# Categorizing the output classes values from integers to binary with three states
nt = []
for i in target:
op = [0,0,0]
op[i] = 1
nt.append(op)
target = np.array(nt)
X = pd.DataFrame(wine.data, columns=wine.feature_names)
Y = target
# Normalizing the data
X = (X-X.min()) / (X.max()-X.min())
# Splitting the data into training set and test set
xtrain, xtest, ytrain, ytest = train_test_split(X.values,Y, test_size=0.8)
```

`In [ 7 ]:`

```
# Initializing the weights value
np.random.seed(3)
w0 = np.random.random((13,50))
w1 = np.random.random((50,30))
w2 = np.random.random((30, 5))
w3 = np.random.random((5,3))
# Here b stands for bias value.
b0 = np.random.random((1,1))-1
b1 = np.random.random((1,1))-1
b2 = np.random.random((1,1))-1
b3 = np.random.random((1,1))-1
epochs = 1000
```

`In [ 8 ]:`

```
# Initialising variables to track loss vs iterations so we can plot the changes
loss_curve = []
iters = []
```

`In [ 9 ]:`

```
for i in tqdm_notebook(range(epochs)):
layer0, layer1, layer2, layer3, layer4 = feed_forward(xtrain, w0,w1,w2, w3, b0,b1,b2,b3)
w0,w1,w2, w3, b0,b1,b2,b3 = backpropogate(i,layer0, layer1, layer2, layer3, layer4, ytrain, w0,w1,w2, w3, b0,b1,b2,b3, 0.005 )
```

`In [ 10 ]:`

```
plt.plot(iters, loss_curve,'r')
```

`In [ 11 ]:`

```
print(accuracy(xtrain, ytrain))
print(accuracy(xtest, ytest))
```

Milestone: Introduction to Keras

Keras is a high-level neural networks API that can run on Tensorflow, Theanos, and CNTK, written in Python. The idea behind developing Keras was to be able to go from an idea to the result with least delay. Keras is a user-friendly Python library which provides a wide range of models for Deep learning. We will dig deep and understand some of the important data structures and methods that Keras uses and build a neural network.

Since Keras can run on Tensorflow, to install Keras you need to install Tensorflow.

You can install TensorFlow using one simple command.

`pip install tensorflow`

If you are using Anaconda prompt then you can use

`conda install tensorflow`

You will see a lot of packages getting installed and keras is a part of this huge package.

Okay, lets get started with keras models.

Milestone: Introduction to Keras

Keras organizes its layers in the form of a model. Models is the core data structure of Keras. There are two important types of models :

- The Sequential model
- Functional API's

Both these methods have some methods and attributes in common. They are:

`model.layers`

: this is a flattened list of the layers comprising the model.`model.inputs`

: is the list of input tensors of the model.`model.outputs`

: is the list of output tensors of the model.`model.summary()`

: prints a summary representation of your model.`model.get_weights()`

: returns a list of all weight tensors in the model, as Numpy arrays.`model.get_config()`

: returns a dictionary containing the configuration of the model.

**What are sequential models?**

Sequential models is just a linear stack of layers. It allows you to build your model layer by layer. Each layer has weights that correspond to the layer the follows it. A typical sequential model looks like this.

```
import tensorflow as tf
from tensorflow.python import keras
model = keras.Sequential([
keras.layers.Dense(32, activation=tf.nn.tanh,
input_shape=(784,)),
keras.layers.Dense(28, activation=tf.nn.tanh),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

You must be wondering what these arguments mean. Don't worry, read on.

**Specifying the layer**: 'Dense' is the layer type available in`keras.layers`

. Dense is a standard layer type that works for most cases. In a dense layer, all nodes in the previous layer connect to the nodes in the current layer. A dense layer looks something like this

The first value passed in the **Dense** is the number of nodes that needs to be present in our neural network. We have 32 nodes in our input layer. This number can also be in the hundreds or thousands. Increasing the number of nodes in each layer increases model capacity.

The first layer needs an **input shape.** The input shape specifies the number of rows and columns in the input. The above example indicates that we have 784 data points which is passed in the network as rows.

The second dense layer that is created is the hidden layer which contains 28 nodes. The last dense layer is the output layer which contains 10 output nodes.

Note that the input layer alone requires you to specify the shape of the input. You don't have to do it for any of the other layers.

**Activation function**: ‘activation’ is the activation function for the layer. An activation function allows models to take into account nonlinear relationships. Non-linear functions are those which have degree more than one. The activation functions are present in the neural network(nn) library of Tensforflow.

This is how you can create a neural network using the sequential model. Now lets try and understand what **Functional API's** are:

Sequential models help you in creating layers for the model. Functional API's help you in compiling the model, evaluating and predicting.

**Compiling the model**

The syntax for compiling the model is:

```
compile(optimizer, loss=None, metrics=None, loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
```

**Arguments:**

**Optimizer**: String value specifying the name of the optimizer. Optimizers shape and mold your model into its most accurate possible form. Gradient descent is the most common optimzer used in most of the neural networks.

**Loss:** String (name of objective function) or objective function. A loss function is a method of evaluating how well your algorithm models your dataset. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower number.

**Metrics**: List of metrics to be evaluated by the model during training and testing. Typically you will use metrics=['accuracy']. Metrics is a way of measuring the accuracy or loss of the model.

**loss_weights:** It is an optional list or dictionary that assigns co-efficients to weight the loss contributions of different model outputs.

**sample_weight_mode**: If you need to pass a 2D weight matrix, set this value to "temporal". None indicates that weights are 1D.

**weighted_matrix**: List of metrics to be evaluated and weighted by sample_weight.

**targeted_tensors**: By default, Keras will create placeholders for the model's target, which will be fed with the target data during training. If instead you would like to use your own target tensors, you can specify them via the target_tensors argument.

**Fitting the inputs to the model:** We need to plug the input values to our model and specify how many iterations are needed.

The syntax for fit is

```
fit(x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
```

**Arguments:**

**x**: Numpy array of input features

**y:** Numpy array of target values

**batch_size**: Integer or None. Number of samples per gradient update.

**epochs**: Integer. Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

**verbose**: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

**callbacks**: List of callbacks to apply during training.

**validation_split**: Float between 0 and 1. Fraction of the training data to be used as validation data.

**validation_data**: tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch.

**shuffle:** Boolean. Decides whether to shuffle the training data before each epoch or str for 'batch'.

**class_weight**: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only).

**sample_weight:** Optional Numpy array of weights for the training samples, used for weighting the loss function.

**initial_epoch**: Integer. Epoch at which to start training

**steps_per_epoch:** Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch.

**validation_steps**: Only relevant if steps_per_epoch is specified. Total number of steps (batches of samples) to validate before stopping.

**Evaluating the model:**

After the model is compiled, we need to evaluate the correctness of the model by checking if the model is making the right predictions on test data. The syntax of evaluate function is as follows:

```
evaluate(x=None, y=None, batch_size=None, verbose=1, sample_weight=None, steps=None)
```

**Arguments:**

**x:** It is the numpy array of the test data. It contains only the features.

**y**:It is the numpy array of the test data. It contains only the target.

**batch_size**: Number of samples per evaluation step.

**verbose**: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.

**sample_weight**: Optional Numpy array of weights for the test samples, used for weighting the loss function.

**Predicting the output using trained model:**

Once the model is trained and evaluated, we use the predict function to make predictions about the output using the input features. The syntax is :

```
predict(x, batch_size=None, verbose=0, steps=None)
```

Arguments:

**x**: The input data, as a numpy array.

**batch_size**: Integer. If unspecified, it will default to 32.

**verbose**: Verbosity mode, 0 or 1.

**steps**: Total number of steps before declaring the prediction round finished.

This is about the types of models and API's present in Keras. Let us now get to using these and creating our own neural network.

Milestone: Build a Optical Character Recognition Project Using Keras

Optical Character Recognition is the recognition of written characters by the computer. We will consider the OCR dataset from MNSIT.Modified National Institute of Standards and Technology database(MNSIT) is a large database of handwritten digits that is commonly used for training various image processing systems.

Let us spend some time in understanding the dataset.

**Data loading and preprocessing:**

We will first load the dataset from MNSIT. The dataset consists of 784 columns and 6000 datapoints. Each row of the dataset is the pixel values that represent a number between 1 and 10. All elements of the row adds up in the formation of the image. So, we cannot reduce any of the features by dropping columns. The goal is to predict the number using the pixel values. The inputs are passed as a 28*28 matrix. The dataset is already split into train and test and will be loaded into (X,Y) for training and(X1,Y1) for test data. The dataset is shown below.

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.python import keras
from keras.datasets import mnist
(X, Y), (X1, Y1) = mnist.load_data()
X
```

X contains the features that are as shown below. It is a numpy array of multiple arrays.

Y contains the corresponding number for the rows present X.

Each of the features are 28*28 matrices. But, we cannot pass the matrix as an input because the model would interpret the data in this form.

We will have to reshape the data such that the input matrix is passed as a single value instead of passing it as (28,28). We will reshape the data as follows.

```
X=X.reshape(len(X),784)
X1=X1.reshape(len(X1),784)
```

784 is the value obtained after we have multiplied 28*28. Now, the data will be of the form (6000,784) which can be passed into the model easily.

We will have to reshape the values of Y as well because it is a single array containing all the values. We need to convert it into a numpy array consisting of multiple arrays so that each row of data corresponds to the number.

```
Y=Y.reshape(len(Y),1)
Y1=Y1.reshape(len(Y1),1)
```

The output now looks like this

Next step, data normalization. We will use the max-min normalizer and normalize our features.

```
X=(X-X.min())/(X.max()-X.min())
X1=(X1-X1.min())/(X1.max()-X1.min())
```

**Data visualization:** We have been talking about how the rows of data represent pixel values and they correspond to a number. Let us visualize the data and see what the data is actually representing. Matplotlib provides us a method called imshow which takes in a matrix as a parameter and the colormap which indicates if the data has to be displayed as a GreyScale or RGB. The output is shown below.

```
mat=X[1].reshape(28,28)
mat
plt.imshow(mat,cmap='Greys')
```

This means, the second row of data in our dataset corresponds to the number 0. You can try for different rows of X and check out what value it prints.

**Building the model:** Now that we have seen how our dataset looks like and visualized the data, let's start building our neural network.

We will construct a sequential model for the data with one layer. For the input layer we will use tanh as the activation function and because we are predicting the probability of predicting a number, we use softmax function for activation of output layer.

```
model = keras.Sequential([
keras.layers.Dense(82, activation=tf.nn.tanh,
input_shape=(X.shape[1],)),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

We will now compile the model. The optimizer used here is the Adam optimizer which is just a better version of Gradient Descend optimizer. The loss function we use here is the sparse_categorical_crossentropy because the output values are integers. If your output values are one-hot encoded then make use of categorical_crossentropy.

```
adam = keras.optimizers.Adam(lr=0.001)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=adam,
metrics=['accuracy'])
```

Next, we will fit the model by feeding in the training dataset, number of epochs and we will use the test dataset for validation.

```
history= model.fit(X, Y, epochs=10,validation_data=(X1,Y1), verbose=2)
```

The output is as follows:

The accuracy of the model is 99.35% and the validation accuracy is 97.43%. Cool isn't it?

To make sure we havent over fit our model, let's just plot a graph of loss vs val_loss.

```
plt.plot(history.epoch, history.history['loss'], 'g')
plt.plot(history.epoch, history.history['val_loss'],'r')
```

We havent overfit our model and have managed to get a great accuracy. Keras did make things easy didn't it? You can check out the MNSIT fashion dataset and build a neural network for it.