12**Tools for Machine Learning w/ Python**

by Vidyadhar Sharma

0.0 Â (0 Reviews) Discussions Start

**About this Roadmap**

Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. This section is for you to understand and get familiar with the basic tools used in machine learning algorithms.

**Milestone 1 :** Jupyter Notebook

**Milestone 2 :** A Quick Glance at Numpy

**Milestone 3 :** Pandas 101

**Milestone 4 :** Plotting Graphs with Matplotlib

After completing this roadmap you will:

- Be able to install and start working with jupyter notebook.
- Understand the basics of numpy, be able to work with numpy arrays and functions.Â
- Be able to work with csv files and dataframes using pandas.
- Learn how to plot different graphs using matplotlib.

Milestone: Jupyter Notebook

Jupyter Notebook is an application that runs with the help of Command Prompts. It presents a web application in which we can write code and execute them in batches. Jupyter Notebook combines two components:

- A Web Application
- Notebook documents

**Main features of the web application:**

- You can edit code in the browser. With auto syntax highlighting and auto tab completion, it becomes easy to code faster.
- You can execute the code from the browser itself. The results of each code block get printed right below it.
- The results get printed using rich media editors like HTML, LaTeX, PNG etc,. For example, the graphs generated by matplotlib render inline. You will understand more of this, once you start working with Jupyter notebook.
- You can include mathematical notations within markdown cells using LaTeX.

**NOTE:** LaTeX is a document preparation system for high-quality typesetting. It is used for technical or scientific documents, but it can also be used for almost any form of publishing.

**Notebook Documents:**

- Notebook documents contain the inputs and outputs of a session. It also stores the text that you write along with the code i.e., the documentation of the code.
- These documents are JSON files and get saved with the .ipynb extension. Here, ipynb stands for "interactive python notebook" or in simple terms ipython notebook.
- In each notebook, you write blocks of code in cells which you can execute one by one.
- The output of each cell gets printed right below their respective cells.
- There is another type of cell which allows you to document your code and this cell is the Markdown Cell.

Milestone: Jupyter Notebook

You can start running a notebook server from the command line interface. Before you start a new notebook, navigate to a folder on your system. This folder is where you will save all the files related to Machine Learning. Create a new folder on Desktop and call it HDS and navigate to this folder on command prompt as shown below:

```
C:\\Users\\Username> cd Desktop
C:\\Users\\Username\\Desktop> md HDS
C:\\Users\\Username\\Desktop> cd HDS
C:\\Users\\Username\\Desktop\\HDS>
```

Once you are in the folder location, execute the command below to start the notebook server:

C:\Users\Username\Desktop\HDS> jupyter notebook

After you execute this command, you will the following output on your command prompt. Also, a new tab will open on your browser:

```
[I 11:39:12.734 NotebookApp] JupyterLab alpha preview extension loaded from D:\\Applications\\Anaconda3\\lib\\site-packages\\jupyterlab
JupyterLab v0.27.0
Known labextensions:
[I 11:39:12.767 NotebookApp] Running the core application with no additional extensions or settings
[I 11:39:12.898 NotebookApp] Serving notebooks from local directory: C:\\Users\\Kaustubh\\Desktop\\MachineLearningFoundations
[I 11:39:12.898 NotebookApp] 0 active kernels
[I 11:39:12.898 NotebookApp] The Jupyter Notebook is running at: <http://localhost:8888/?token=0f08ae981ce1e9b12634130c46ef8f1178a2a0f0d3a5ca52>
[I 11:39:12.898 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:39:12.900 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
<http://localhost:8888/?token=0f08ae981ce1e9b12634130c46ef8f1178a2a0f0d3a5ca52>
[I 11:39:12.991 NotebookApp] Accepting one-time-token-authenticated connection from ::1
```

Here, you can create many folders and create many files for executing code. To either create a new folder or a file, click on new:

Selecting Python3 creates a new Python file. Selecting Text File creates a new text file. Selecting a Folder creates a folder. All this happens inside the HDS folder which we created earlier. Now we have a clear understanding of what is Jupyter Notebook and how it works.

To save a Python3 file, create a new Python3 file. A new tab will pop up.

Click on the untitled text to change the name of the file:

Enter the desired name and then hit on rename to rename the file to whatever name you want.

So now to start learning Machine Learning Tools like NumPy, Pandas, Matplotlib, let us create a new Python3 file and start writing code into it.

Milestone: A Quick Glance at Numpy

Before we start learning let us first understand what NumPy is and why we need it.

NumPy is the fundamental package for scientific computing with Python. It contains among other things like:

- a powerful N-dimensional array object
- sophisticated functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transforms and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multidimensional container of generic data.

So basically, numpy is like lists in Python. But why is it better than lists?

There are three majore benefits of using NumPy arrays over lists. They are:

- Less Memory
- Faster Execution of code
- Convenient to use

To start working on NumPy, open a notebook and save it as **numpy-basics**

We will be going through the different ways of using numpy in the next action.

Milestone: A Quick Glance at Numpy

NumPy's main object is the homogeneous multidimensional array. It is a table of elements, all the same datatype. In NumPy, dimensions are axes.

NumPy's array class is ndarray. To handle NumPy's arrays we have attributes of the ndarray object. Let us work with these attributes with an example:

NumPy Basics - Here we will learn the different attributes related to numpy array

`InÂ [1]:`

```
import numpy as np
```

To create an array of known values we will have to use, np.array() function

`InÂ [2]:`

```
#Creating a 1-D array of dimension 1X15
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
print(a)
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
```

In np.array() function we have used - [] - It represents the dimensions of the matrix of arrays. If the matrix is a 1-D array, then there will be only one set of square brackets. For two and three dimensions, refer to the examples below.

`InÂ [3]:`

```
#Creating a 2-D array of dimension 3X3
b = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(b)
```

[[1 2 3]

[4 5 6]

[7 8 9]]

`InÂ [4]:`

```
#Creating a 3-D array
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])print(c)
```

[[[1 2]

[3 4]]

[[5 6]

[7 8]]]

You saw how you can create 1-D, 2-D and 3-D array. You can create arrays of how many ever dimensions we want. But, you might get confused at one point and might lose track of the square brackets. To avoid all this mess, we can use a function called .reshape(), which reshapes a given array to any dimension. Let us try this on the array 'a'

`InÂ [5]:`

```
a.reshape(3,5)
```

`Out[5]:`

array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])

`InÂ [6]:`

```
a.reshape(2,4)
```

**---------------------------------------------------------------------------**

**ValueError** Traceback (most recent call last) **<ipython-input-6-22f6e6d0bd97>** in <module>**()**

When we tried to reshape 'a' into a 3X5 matrix, it worked but it didn't work when we tried to reshape it into a 2X4 matrix. This is because of the number of elements which were present in the array a. So when we are reshaping any matrix, the product of the dimension values should be equal to the number of elements in the array.

To create an array of continuous values, we use a function np.arange()

`InÂ [7]:`

```
new_array = np.arange(15)
print(new_array)
```

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

We can apply .reshape() while creating an array also.

`InÂ [8]:`

```
new_array2 = np.arange(15).reshape(3,5)
print(new_array2)
```

[[ 0 1 2 3 4]

[ 5 6 7 8 9]

[10 11 12 13 14]]

To get the shape of the array, we use ndarray.shape(), where ndarray represents the array name. Refer the example below:

`InÂ [9]:`

```
print(f"Shape of array: {new_array2.shape}")
```

Shape of array: (3, 5)

To check the total number of axes for a given array, execute the following code:

`InÂ [10]:`

```
print(f"The number of axes in matrix 'new_array2': {new_array2.ndim}")
print(f"The number of axes in array 'a': {a.ndim}")
print(f"The number of axes in matrix 'c': {c.ndim}")
```

The number of axes in matrix 'new_array2': 2

The number of axes in array 'a': 1

The number of axes in matrix 'c': 3

To find out the total number of elements present in an array, execute the ndarray.size command. Refer the example below to get a better understanding:

`InÂ [11]:`

```
print(f"Size of an array 'new_array2' (Number of elements): {new_array2.size}")
print(f"Size of an array 'c' (Number of elements): {c.size}")
```

Size of an array 'new_array2' (Number of elements): 15 Size of an array 'c' (Number of elements): 8

For performing mathematical operations, the data type needs to be the same. You will see where you have to keep the datatype same. To know which data type to use, you have to know the datatype of the array. If you execute the code, ndarray.dtype, you will get to know the type of data present in the array

`InÂ [12]:`

```
print(f"datatype of elements: {new_array2.dtype}")
```

datatype of elements: int32

To check for the type of the array, type(ndarray) and execute it. It returns the type of the array.

```
print(f"The type of array: {type(new_array2)}")
```

The type of array: <class 'numpy.ndarray'>

You can use the methods that we implemented here in the following actions if you are stuck at any point. Feel free to use these tools as you like.

Milestone: A Quick Glance at Numpy

In this action, you will learn how to create arrays and print them in different ways using numpy. There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using array function. You can deduce the type of the resulting array from the type of the elements in the sequences.

InÂ [1]:

```
**import** numpy **as** np
```

InÂ [2]:

```
a = np.array([1,2,3,4])
print(a)
print(f"Datatype of a: {a.dtype}")
```

[1 2 3 4] Datatype of a: int32

InÂ [3]:

```
b = np.array([1.2, 3.5, 8.21])
print(f"Datatype of b: {b.dtype}")
```

Datatype of b: float64

You have already created these types of arrays before. But here we will be checking out how we can create arrays of different data types.

One error which is you see in beginners is, they give a set of values in the brackets than giving it in the form of a list. You can see this error below.

InÂ [4]:

```
a = np.array(1,2,3,4) *# Wrong of way initializing a numpy array*
```

**---------------------------------------------------------------------------**

**ValueError** Traceback (most recent call last) **<ipython-input-4-910da684b7cb>** in <module>**()**

**----> 1** a **=** np**.**array**(1,2,3,4)** **# Wrong of way initializing a numpy array**

InÂ [5]:

```
a = np.array([1,2,3,4]) *# Right way of initializing a numpy array*
```

To create an array of a different data type, you can specify the datatype while creating it. You can see an example of this below:

InÂ [6]:

```
b = np.array([[1,2,3,4],[5,6,7,8]], dtype=complex)print(b)
```

[[1.+0.j 2.+0.j 3.+0.j 4.+0.j]

[5.+0.j 6.+0.j 7.+0.j 8.+0.j]]

When we were creating the array, we gave the values in integer format, but the datatype was specified to be complex. Hence the integers got converted into complex numbers and was stored in the variable

The function zeros creates an array full of zeros, the function ones creates an array full of ones and the function emptty creates an array whose initial content is random and depends on the state of the memory. By default, the datatype of the created array is float64. Below, we will see how an array of zeros, an array of ones and an array of empty elements are created.

InÂ [7]:

```
np.zeros((3,4))
```

Out[7]:

array([[0., 0., 0., 0.],

```
[0., 0., 0., 0.],
```

InÂ [8]:

```
np.ones((3,3,4), dtype=np.int64)
```

Out[8]:

array([[[1, 1, 1, 1],

```
[1, 1, 1, 1],
```

[1, 1, 1, 1]],

```
[[1, 1, 1, 1],
```

[1, 1, 1, 1],

[1, 1, 1, 1]],

```
[[1, 1, 1, 1],
```

[1, 1, 1, 1],

InÂ [9]:

```
np.empty((3,4), dtype=np.float32)
```

Out[9]:

array([[0., 0., 0., 0.],

```
[0., 0., 0., 0.],
```

To generate random values in an array, numpy.random.rand() function is used

InÂ [10]:

```
np.random.rand(2,3)
```

Out[10]:

array([[0.12568316, 0.28296798, 0.2019476 ],

```
[0.12036288, 0.00708648, 0.63168991]])
```

Suppose we wanted to create an array having elements which are equally spaced, we use a function called linspace.An example is given below showing linspace works

InÂ [11]:

```
*# Creating an array of 5 elements between 2 and 3*
np.linspace(2,3,num=5)
```

Out[11]:

array([2. , 2.25, 2.5 , 2.75, 3. ])

InÂ [12]:

```
*# Creating an array of 5 elements between 2 and 3 excluding the endpoint which would've been 3. There by we get the following output*
np.linspace(2,3,num=5,endpoint=**False**)
```

Out[12]:

array([2. , 2.2, 2.4, 2.6, 2.8])

InÂ [13]:

```
*# When retstep is set to True, it returns the steps of increase which was used to create the array*
np.linspace(2,3,num=5,retstep=**True**)
```

Out[13]:

(array([2. , 2.25, 2.5 , 2.75, 3. ]), 0.25)

Here, all the actions which were performed to create an array was not assigned to any variable. If you want to use any of the created arrays, you can assign them to a variable and then, they can be used in performing some operation. In the previous card, we have seen how we can print arrays in different ways and how we can reshape the arrays, the same actions can also be performed on the arrays that have been created here.

Milestone: A Quick Glance at Numpy

Arithmetic operations on arrays apply element wise. A new array gets created and filled with the result

Let us work with 1 dimensional array at first and then proceed to 2 dimensional arrays

`InÂ [1]:`

```
import numpy as np
```

`InÂ [2]:`

```
a = np.array([10,20,30,40,50])
a
```

`Out[2]:`

```
array([10, 20, 30, 40, 50])
```

`InÂ [3]:`

```
b = np.arange(5)
b
```

`Out[3]:`

```
array([0, 1, 2, 3, 4])
```

Here, you are going to perform basic arithmetic operations on arrays. You have to store the output of each of these operations into different variables.

`InÂ [4]:`

```
add = a + b
print(add)
[10 21 32 43 54]
```

`InÂ [5]:`

```
sub = a - b
print(sub)
[10 19 28 37 46]
```

`InÂ [6]:`

```
mul = a * b
print(mul)
[ 0 20 60 120 200]
```

`InÂ [7]:`

```
c = np.array([1,3,5,7,9])
```

`InÂ [8]:`

```
div = a / c
print(div)
[10. 6.66666667 6. 5.71428571 5.55555556]
```

`InÂ [9]:`

```
# This type of division is also called as floor division as it gives the previous whole number
quotient = a // c
print(quotient)
[10 6 6 5 5]
```

`InÂ [10]:`

```
# This function is also called the modulus function
rem = a % c
print(rem)
[0 2 0 5 5]
```

You can also implement trigonometric functions like sine, cosine etc., with our arrays as shown below.

`In [11]:`

```
np.sin(b)
```

`Out[11]:`

```
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
```

`InÂ [12]:`

```
np.cos(b)
```

`Out[12]:`

```
array([ 1. , 0.54030231, -0.41614684, -0.9899925 , -0.65364362])
```

To perform matrix multiplication, we use the dot product operation in numpy as shown below.

`InÂ [13]:`

```
A = np.arange(1,21,2).reshape(2,5)
A
```

`Out[13]:`

```
array([[ 1, 3, 5, 7, 9], [11, 13, 15, 17, 19]])
```

`InÂ [14]:`

```
B = np.arange(1,11).reshape(5,2)
B
```

`Out[14]:`

```
array([[ 1, 2], [ 3, 4], [ 5, 6], [ 7, 8], [ 9, 10]])
```

`InÂ [15]:`

```
A.dot(B)
```

`Out[15]:`

```
array([[165, 190], [415, 490]])
```

`InÂ [16]:`

```
# To create a matrix with 1's, you have to execute the code below. The numbers in the bracket represent the dimension of the matrix.
X = np.ones((2,3), dtype=int)
X
```

`Out[16]:`

```
array([[1, 1, 1], [1, 1, 1]])
```

`InÂ [17]:`

```
# To generate a matrix with random numbers you should use a method called random. The dimension of the matrix is specified inside the brackets.
Y = np.random.random((2,3))
Y
```

`Out[17]:`

```
array([[0.80864314, 0.47348881, 0.96235334], [0.58267784, 0.55490071, 0.9128418 ]])
```

`InÂ [18]:`

```
X + Y
```

`Out[18]:`

```
array([[1.80864314, 1.47348881, 1.96235334], [1.58267784, 1.55490071, 1.9128418 ]])
```

`InÂ [19]:`

```
# When you try to add a floating value to an integer value, you get the following error
X += Y
X
**---------------------------------------------------------------------------
TypeError** Traceback (most recent call last)
**<ipython-input-19-5811f09afea9>** in <module>**()
----> 1** X **+=** Y
2 X
**TypeError**: Cannot cast ufunc add output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
```

`InÂ [20]:`

```
# But you can add an integer value to a floating value.
Y += X
Y
```

`Out[20]:`

```
array([[1.80864314, 1.47348881, 1.96235334], [1.58267784, 1.55490071, 1.9128418 ]])
```

`InÂ [22]:`

```
# To create a matrix with equally spaced elements, you should use linspace. # linspace takes 3 arguments, starting value, ending value and the number of elements that should be present between the starting value and the ending value.
# In the example below, the starting value is 0, the ending value is 5 and the number of elements between 0 & 5 is 10.
# After you execute this code, you get the output you see below.
z = np.linspace(0,5,10) z
```

`Out[22]:`

```
array([0. , 0.55555556, 1.11111111, 1.66666667, 2.22222222, 2.77777778, 3.33333333, 3.88888889, 4.44444444, 5. ])
```

`InÂ [23]:`

```
d = np.random.random((2,5))
d
```

`Out[23]:`

```
array([[0.7172701 , 0.23319532, 0.98240961, 0.81561452, 0.22143595], [0.21806764, 0.82034081, 0.28922111, 0.5824367 , 0.85783084]])
```

`InÂ [24]:`

```
# sum method returns the sum of all the elements that are present in the array.
d.sum()
```

`Out[24]:`

```
5.73782258798251
```

`InÂ [25]:`

```
# min method gives the minimum value that is present in an array
d.min()
```

`Out[25]:`

```
0.2180676356151654
```

`InÂ [26]:`

```
# max method gives the maximum value that is present in an array
d.max()
```

`Out[26]:`

```
0.9824096078674056
```

`InÂ [27]:`

```
# You can use the sum method to do the addition of numbers according to axis values
# When the axis value is 0, the elements get added column-wise.
d.sum(axis=0)
```

`Out[27]:`

```
array([0.93533774, 1.05353612, 1.27163072, 1.39805122, 1.07926679])
```

`InÂ [28]:`

```
# When the axis value is 0, the elements get added row-wise.
d.sum(axis=1)
```

`Out[28]:`

`array([2.96992549, 2.7678971 ])`

Milestone: A Quick Glance at Numpy

In this section, you will learn the different functions which you can use with the numpy arrays.

These functions are useful when you apply mathematical operations to individual array elements. Using these functions avoids the need to iterate over the array elements.

Numpy provides a lot of mathematical operations such as exp, sqrt, sin, cos and so on. These are the Universal Functions and you can use them with dot notation.

InÂ [1]:

```
**import** numpy **as** np
```

InÂ [3]:

```
arr1 = np.arange(4)
arr2 = np.array([10,20,30,40])
print(f"Exponent: {np.exp(arr1)}")
print(f"square root: {np.sqrt(arr1)}")
print(f"addition: {np.add(arr1,arr2)}")ââ
```

Exponent: [ 1. 2.71828183 7.3890561 20.08553692] square root: [ 0. 1. 1.41421356 1.73205081] addition: [10 21 32 43]

There are many more such functions which you can use.

**For matrices**: cross, dot, transpose, inner. **For manipulation of values**: max, min, mean, median, nonzero, sort, sum, var and so on

InÂ [7]:

```
dim_arr1 = np.arange(4).reshape((2,2))
dim_arr2 = np.array([[10,10],[10,10]])
âprint(f"Transpose:\\n {np.transpose(dim_arr1)}\\n")
print(f"Inner product:\\n {np.inner(dim_arr1,dim_arr2)}\\n")
print(f"Cross product:\\n {np.cross(dim_arr1,dim_arr2)}\\n")
print(f"Dot product:\\n {np.dot(dim_arr1,dim_arr2)}\\n")â
```

Output:

Transpose: [[0 2] [1 3]] Inner product: [[10 10] [50 50]] Cross product: [-10 -10] Dot product: [[10 10] [50 50]]

It's best if you can remember these functions. You will using them a lot in you ML algorithms.

Milestone: A Quick Glance at Numpy

Like lists and strings in python, you can slice 1-D NumPy arrays, index them and iterate over them. As numpy arrays are faster to access data from, the same 'for loop' on this array will be faster than on a list.

Indexing is done as follows:

`arr = np.arange(10)**2 #indexing print(arr[3])`

Output: 9

This is like a list. Except that, in numpy array, the elements get stored in a consecutive memory location. Whereas in a list, the pointer to the data gets stored in the consecutive memory locations.

Slicing is done as follows:

`arr = np.arange(10)**2 #slicing print(arr[3:8],'\\n')`

Output: [ 9 16 25 36 49]

Slicing with skipping of indices:

`arr = np.arange(10)**2 arr[:8:2] = 100 print(arr)`

Output: [100 1 100 9 100 25 100 49 64 81]

Both of these are like string slicing in python.

Iterating over the array is done using for loop:

`arr = np.arange(10) for i in arr: print i*2`

Output:

0

2

4

6

8

10

12

14

16

18

Multi-dimensional numpy arrays have one index per axis, these indices are given as a tuple.

**Indexing:**

```
arr = np.array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
print(arr[3,2]) #element of fourth row and second column
print(arr[0:5,2]) #third column
```

Output:

32

array([ 2,12,22,32,42])

When the number of indices gives is less than the number of axes, then those missing indices are considered to be complete slices.

```
arr[-1]
```

Output: array([40,41,42,43])

The dots (...) represent as many colons as needed to produce a complete indexing tuple.

If **arr** has 6 axes, then we can write,

```
arr[1,2,....] # this is the same as below
arr[1,2,:,:,:]
```

You can write a 3-D array as:

```
arr = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
[ 10, 11, 12]], [[100,101,102],
[1000,1001,1002]]])
print(arr.shape)
```

Output: (2,2,3)

**Iterating:**

To iterate over an array, you have to do it row wise. To get each element of the array, first you have to flatten the array, and then loop over it.

```
arr = np.array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
for row in arr:
print row
for element in arr.flat:
print(element)
```

Output:

```
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43
```

Milestone: A Quick Glance at Numpy

An array shape is given by the number of elements in each axis.

```
arr = np.floor(10*np.random.random((3,4)))
print arr.shape
```

Output: (3,4)

You can change the shape of an array with various commands. Note that the following three commands all return a modified array, but do not change the original array:

```
arr = np.floor(10*np.random.random((3,4)))
print(arr.ravel())
print(arr.reshape(6,2))
print(arr.T)
print(arr.shape)
print(arr.T.shape)
```

Output:

```
array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
(3, 4)
(4, 3)
```

ravel() converts an array of any dimension to a 1-D array with all the elements in it. reshape() method converts the array dimension to the values that you mention. Here, we are converting 3X4 to a 6X2 matrix. As you can see, the number you get after multiplying both the dimensions is same. This is one rule that you have to keep in mind.

The product of the dimension values, should be same as the original dimension values' product.

The ndarray.resize method modifies the array itself:

```
a.resize((2,6))
```

Output:

array([[ 2., 8., 0., 6., 4., 5.], [ 1., 1., 8., 9., 3., 6.]])

If you give a dimension value as -1 while reshaping, that dimension is automatically calculated:

```
a.reshape(3,-1)
```

Output:

array([[ 2., 8., 0., 6.], [ 4., 5., 1., 1.], [ 8., 9., 3., 6.]])

Milestone: A Quick Glance at Numpy

Stacking is taking many arrays and putting them one above the other. It is like a stack of book. The advantage of this stack is you can put the stack column-wise or row-wise.

What you have to keep in mind is, if you want to stack two arrays, they should have matching dimensions.

column_stack is for stacking 1-D arrays, while hstack is for stacking 2D arrays.

Likewise, row_stack is for 1-D arrays, while vstack is for 2-D arrays.

```
arr1 = np.arange(6).reshape(2,3)
arr2 = np.arange(6).reshape(2,3)
print("arr1:\\n",arr1)
print("arr2:\\n",arr2)
print("\\n Using vstack and hstack\\n")
print("vertical stacking:\\n",np.vstack((arr1,arr2)))
print("horizontal stacking:\\n",np.hstack((arr1,arr2)))
print("\\n Using column_stack and row_stack\\n")
print("Horizontal stacking:\\n",np.column_stack((arr1,arr2)))
print("Vertical stacking:\\n",np.row_stack((arr1,arr2)))
```

Output:

```
arr1:
[[0 1 2]
[3 4 5]]
arr2:
[[0 1 2]
[3 4 5]]
Using vstack and hstack
vertical stacking:
[[0 1 2]
[3 4 5]
[0 1 2]
[3 4 5]]
horizontal stacking:
[[0 1 2 0 1 2]
[3 4 5 3 4 5]]
Using column_stack and row_stack
Horizontal stacking:
[[0 1 2 0 1 2]
[3 4 5 3 4 5]]
Vertical stacking:
[[0 1 2]
[3 4 5]
[0 1 2]
[3 4 5]]
```

In general, for arrays of with more than two dimensions, hstack stacks along their second axes. vstack stacks along their first axes. Concatenate takes an optional argument which is the number of the axis along which the concatenation should happen.

Where would you use this?

You can use these methods when you want to combine data or add a column to an existing data.

Milestone: A Quick Glance at Numpy

Using **hsplit**, you can split an array along its horizontal axis, either by specifying the number of equally shaped arrays to return or by specifying the columns after which the division should occur.

```
arr = np.arange(20).reshape((2,10))
arr2 = arr.reshape(4,5)
print("\\n Along horizontal axis:\\n")
print(np.hsplit(arr,5))
print("arr is now divided into 5 different arrays.\\n")
print(np.hsplit(arr,(2,5)))
print(" arr is split after the second and the fifth column\\n")
```

Output:

```
Along horizontal axis:
[array([[ 0, 1],
[10, 11]]), array([[ 2, 3],
[12, 13]]), array([[ 4, 5],
[14, 15]]), array([[ 6, 7],
[16, 17]]), array([[ 8, 9],
[18, 19]])]
arr is now divided into 5 different arrays.
[array([[ 0, 1],
[10, 11]]), array([[ 2, 3, 4],
[12, 13, 14]]), array([[ 5, 6, 7, 8, 9],
[15, 16, 17, 18, 19]])]
arr is split after the second and the fifth column
```

**vsplit** is similar to hsplit but it splits along the vertical axis.

```
arr = np.arange(20).reshape((2,10))
arr2 = arr.reshape(4,5)
print("\\nAlong vertical axis:\\n")
print(np.vsplit(arr2,2))
print("\\n arr2 is split after the second row\\n")
```

Output:

```
Along vertical axis:
[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])]
arr2 is split after the second row
```

**array_split** is another function we can use to split a numpy array, but here you can also define the axis as well.

syntax: array_split(ary, indices_or_sections, **axis=0/1**)

```
arr = np.arange(20).reshape((2,10))
arr2 = arr.reshape(4,5)
print("\\nUsing array_split:")
print("\\n",np.array_split(arr2,3,0))
print("\\n",np.array_split(arr2,3,1))
```

Output:

```
Using array_split:
[array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14]]), array([[15, 16, 17, 18, 19]])]
[array([[ 0, 1],
[ 5, 6],
[10, 11],
[15, 16]]), array([[ 2, 3],
[ 7, 8],
[12, 13],
[17, 18]]), array([[ 4],
[ 9],
[14],
[19]])]
```

Notice hsplit cuts along the column, while vsplit cuts along rows.

Milestone: A Quick Glance at Numpy

Sometimes, you have to manipulate and perform operations on a copy numpy arrays. You have to do this to keep the original array data for any further operation.

There are three different cases:

No copy has been made

Consider the following example:

`arr1 = np.arange(10) arr2 = arr1 print(arr2 is arr1) print(arr2.shape(2,5))`

In this case, arr1 and arr2 are not two different arrays. The values of the array are stored in the same memory location, but referenced through different variables. So any manipulation done on either one of the two variables creates a change in both. This is not recommended while coding, as it might be confusing for someone reading your code.

Shallow copy or View

You can use the view method create a new array object with the same data as the original. When you use this method, the copy of the array is in a different memory location. Thus, it becomes easy to play around with these arrays.

observe the example below:

`arr1 = np.arange(20) arr2 = arr1.view() print(arr2 is arr1)`

The output of this will be false, as the arr2 is a copy of the arr1 data but is not referenced to the same address location.

`arr2.shape = (5,4) print(arr1.shape) arr2[0] = 100 print(arr1[0])`

Output:

(1,0)

100

This shows that the output of the shape of arr1 is still (1,0) even after we changed the shape of arr2. But manipulation of arr2 data does impact data of arr1. So the view function only allows for manipulation of data and not shape.

The working of a view works in the same way as slicing. View does the same thing as string slicing without any value inside the [ ], except colon.

`arr3 = arr1[:] #this is same as creating arr2 using view function`

Deep copy

This makes a copy of an array and its data, and the new array created will be independent of the original array. This means that any manipulation of data, shape etc., in the new array will not affect the original array.

Consider the following example:

`arr1 = np.array([1,2,3,4]) arr2 = arr1.copy() print(arr2 is arr1) arr2[0] = 2 arr2[3] = 2 arr2.shape = 2,2 print("arr1="arr1) print("arr2="arr2)`

Output:

`False arr1=[1,2,3,4] arr2=[[2,2], [2,4]]`

As we can see, arr2 is completely independent of arr1.

Milestone: A Quick Glance at Numpy

Here is a list of some of the functions and methods based on their usage.

- Array creation- arange, array, copy, ones, zeros and empty are some you should remember.
- Conversions- mat, atleast_1d, atleast_2d, atleast_3d are used as follows.
- Manipulations- vstack, hstack, column_stack, row_stack, ravel, reshape, resize, transpose are some functions you will use on numpy arrays, both one dimensional and multi dimensional.
- Questions- all, any, nonzero, where are used as follows:
- Ordering- max, min, sort, searched sort are key ordering functions which are used.
- Operations- sum, prod, real, compress are a few operations which can be done.
- Basic Statistics- mean, std, var, cov used as follows.
- Basic Linear Algebra- cross, dot, outer, inner, vdot, linalg.svd are a few common linear algebra functions.

Milestone: Pandas 101

Python is very good for data munging, but not as good for data analysis and modelling. Pandas help bridge this gap. It enables us to carry out entire data analysis workflow in Python.

Pandas is a very powerful python toolkit which provides fast, flexible and expressive data structures. It is designed to make working with different types of data both easy and intuitive. It enables us to do practical, real-world data analysis in Python.

Whatâs cool about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object. These objects contain rows and columns and its called, data frame. It looks like a table in statistical software (think Excel). This is so much easier to work with when you compare it to working with lists or dictionaries.

Milestone: Pandas 101

You can install pandas using pip [ **pip install pandas** ] in your command prompt. The latest version of pandas will get installed on your device.

After installing, you can import it in your code using **import** statement, as follows.

```
import pandas as pd
```

You have to import other libraries also while working with Pandas.

Numpy and Pandas together form an awesome machine learning base for coding in python.

Pandas has a few inbuilt library functions such as DataFrame, read_csv, Series, date_range etc., which helps you in reading data.

```
from pandas import DataFrame, read_csv
```

There are other libraries such as sklearn that provide us with data sets to train our algorithms. There are different types of data sets in sklearn. You'll see them when once you start working with Machine Learning algorithms.

Milestone: Pandas 101

**What is a dataset?**

A dataset is a collection of data which is you usually see in the form of tables. All the table's contents make up the dataset. Every column of the table represents a particular feature. Each row is an instance that provides value to the data set.

You will learn to use different kinds of data sets.

**DataFrame**

A dataset will have many rows and columns like a table of contents. You can represent this data in the form of a python dictionary. The key will be the column names and the value will the contents of that particular column. When you use the data sets from sklearn, all the data will be in the form of a dictionary. So, to view this data in the form of a table, you will use the DataFrame method from pandas. The DataFrame method will convert the keys of the dictionary into column names. The values will get put under their respective keys.

```
exponents = pd.DataFrame({"Numbers":[1,2,3,4],
"Squares":[1,4,9,16],
"Cubes":[1,8,27,64]})
print(exponents)
```

Output:

```
Cubes Numbers Squares
0 1 1 1
1 8 2 4
2 27 3 9
3 64 4 16
```

So we can see that the data we'd written in the form of a dictionary is actually stored as a table by using DataFrame. Whenever you are working on a machine learning problem, the data sets are very large. You'll see that loading these kinds of data into a DataFrame is very easy. Let's get into this later. Now, you should focus on understanding how to use Pandas methods. There is a special method in pandas that allows you to get two or more tables together. This method is, concat(). You can concatenate tables either row-wise or column-wise. The code block below will help you to understand the process of concatenation.

```
exponents = pd.DataFrame({"Numbers":[1,2,3,4],
"Squares":[1,4,9,16],
"Cubes":[1,8,27,64]})
exponents2 = pd.DataFrame({"Numbers":[5,6,7,8],
"Squares":[25,36,49,64],
"Cubes":[125,216,343,576]})
new_data0 = pd.concat([exponents,exponents2],axis=0)
new_data1 = pd.concat([exponents,exponents2],axis=1)
print("new_data0=\\n",new_data0)
print("new_data1=\\n",new_data1)
```

Output:

```
new_data0=
Cubes Numbers Squares
0 1 1 1
1 8 2 4
2 27 3 9
3 64 4 16
0 125 5 25
1 216 6 36
2 343 7 49
3 576 8 64
new_data1=
Cubes Numbers Squares Cubes Numbers Squares
0 1 1 1 125 5 25
1 8 2 4 216 6 36
2 27 3 9 343 7 49
3 64 4 16 576 8 64
```

Milestone: Pandas 101

Comma separated values are widely used data format, and pandas offers some functions to manipulate this csv data.

**Parsing data**

read_csv and are two functions that you can use to read data from csv files. This method intelligently converts tabular data into a DataFrame object.

Let use a csv file and read its data into an object.

```
import pandas as pd
import numpy as np
data = pd.read_csv("finds.csv")
print(data)
```

Output:

```
Sky Airtemp Humidity Wind Water Forecast WaterSport
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Cloudy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change Yes
```

**Manipulations of csv data:**

Now that you have loaded the data in the csv file, you can manipulate it using more arguments in the read_csv function.

```
data = pd.read_csv("finds.csv",index_col=0)
```

When you add the argument index_col =0, you will not see the serial number of the rows.

```
Sky Airtemp Humidity Wind Water Forecast WaterSport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Cloudy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
```

To get the data type of each column, you have to use the .dtype object. It will return the type of data in each column.

```
print(data.dtype)
```

Ouput:

```
Airtemp object
Humidity object
Wind object
Water object
Forecast object
WaterSport object
dtype: object
```

You can also read required columns of the csv data using **usecols**=[]. You can use this argument just the way you used index_col argument.

```
data = pd.read_csv("finds.csv",usecols=["Sky","Airtemp"])
print(data)
```

Ouput:

```
Sky Airtemp
0 Sunny Warm
1 Sunny Warm
2 Cloudy Cold
3 Sunny Warm
```

You need to know the header of each column in order to use this. And the headers have to be passed inside a list with correct punctuation.

To get an individual column, you can pass the column name to the variable in which you have saved the data.

```
import pandas as pd
import numpy as np
data = pd.read_csv("finds.csv")
print(data["Sky"])
```

Output:

```
0 Sunny
1 Sunny
2 Cloudy
3 Sunny
Name: Sky, dtype: object
```

**Writing to a CSV**

To save an existing data to a csv file, you have to use to_csv method. It copies the data from the existing variable and saves it to the file you mention. You can see an example of this below:

```
data.to_csv("newfile.csv")
new_data = pd.read_csv("newfile.csv")
print(new_data)
```

Output:

```
Sky Airtemp Humidity Wind Water Forecast WaterSport
0 Sunny Warm Normal Strong Warm Same Yes
1 Sunny Warm High Strong Warm Same Yes
2 Cloudy Cold High Strong Warm Change No
3 Sunny Warm High Strong Cool Change Yes
```

The new csv file gets created in the same directory where the program is running. But the major difference between the new csv file to the original is that the index value gets saved in the new csv file. You can overcome this by passing the argument index=False. You can also drop column names by giving header=False.

To get only a few specific columns into the new csv file you have to indicate the names of the columns as follows:

```
import pandas as pd
import numpy as np
data = pd.read_csv("finds.csv")
print(data.columns)#to get to know the name of columns
#the columns names have to be passed as a list
data.to_csv("twocolumns.csv",columns=["Sky",'Water'],index=False)
new_filedata = pd.read_csv("twocolumns.csv")
print(new_filedata)
```

Output:

```
Index(['Numbers', 'Airtemp', 'Humidity', 'Wind', 'Water', 'Forecast','WaterSport'],dtype='object')
Sky Water
0 Sunny Warm
1 Sunny Warm
2 Cloudy Warm
3 Sunny Cool
```

So the new csv file twocolumns.csv will have only the columns Sky and Water in it. This file is also created in the same directory as the program in running in.

Milestone: Pandas 101

You can add a new column, like how you'd add a key-value pair to a dictionary. You can see an example of this below.

```
exponents = pd.DataFrame({"Numbers":[1,2,3,4],
"Squares":[1,4,9,16],
"Cubes":[1,8,27,64]})
exponents["4th power"]=[1,16,81,256]
print(exponents)
```

Output:

```
Cubes Numbers Squares 4th power
0 1 1 1 1
1 8 2 4 16
2 27 3 9 81
3 64 4 16 256
```

The new column gets added to the end of the table and the database gets updated.

You can even use data inside the table, to create values for a new column.

```
exponents["total"] = exponents["Squares"]+exponents["Cubes"]+exponents["4th power"]
print(exponents)
```

Output:

```
Cubes Numbers Squares 4th power total
0 1 1 1 1 3
1 8 2 4 16 28
2 27 3 9 81 117
3 64 4 16 256 336
```

So by just indicating which column values need to be added, you have created row elements for column "total". This will get updated when we add a new row.

To delete a column, you have to use the drop() method. You have to specify the column name(s) in a list passed as an argument to the function and that column will get deleted from the database and it also gets updated.

```
new_exponents = exponents.drop(["total"],axis=1)
print(new_exponents)
```

Output:

```
Cubes Numbers Squares 4th power
0 1 1 1 1
1 8 2 4 16
2 27 3 9 81
3 64 4 16 256
```

The column "total" is dropped from the database and a new_exponents has all columns except "total". The original database ( exponents ) is unaltered, to change that as well we need to pass inplace=True as an argument to drop().

To delete rows we can use drop() but we have to pass the index of that row and set axis=0.

```
new_exponents = exponents.drop(0,axis=0)
print(new_exponents)
```

Output:

```
Cubes Numbers Squares 4th power total
1 8 2 4 16 28
2 27 3 9 81 117
3 64 4 16 256 336
```

As only index 0 is mentioned, the 0th row is eliminated from the database.

To add anew row we need to use loc[] on the database object and pass the index value for the row as an argument to the function and provide the values to the row in the form of a list.

```
new_exponents.loc[0]=[1,1,1,1,3]
print(new_exponents)
```

Output:

```
Cubes Numbers Squares 4th power total
1 8 2 4 16 28
2 27 3 9 81 117
3 64 4 16 256 336
0 1 1 1 1 3
```

The new row is added to the end of the table and all the values passed in the list correspond to the immediate column value.

Milestone: Pandas 101

Pickle is an inbuilt python library for serializing and de-serializing Python object. It implements a fundamental, but powerful algorithm for the same.

âPicklingâ is the process whereby a Python object hierarchy is converted into a byte stream, and âunpicklingâ is the inverse operation, whereby a byte stream is converted back into an object hierarchy.

Pickle files are of format .pkl or .pk and contain data in binary form and take up less memory.

```
import pickle
exponents = pd.DataFrame({"Numbers":[1,2,3,4],
"Squares":[1,4,9,16],
"Cubes":[1,8,27,64]})
with open("exponents.pkl","wb") as file:
pickle.dump(exponents, file)
del exponents
print(exponents)
```

Output:

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-39-c7a38fe0d536> in <module>()
----> 1 del exponents
NameError: name 'exponents' is not defined
```

You have created a new pickle file called exponents.pkl in which we have dumped the binary values of the dataset exponents. (It is binary cause you have passed the value "wb", which is "write binary" ). You have deleted the database, exponents. That is why you are getting the NameError.

To restore the data from pickle file, you have write the code below and execute it.

```
import pickle
with open("exponents.pkl","rb") as file:
exponents = pickle.load(file)
print(exponents)
```

Output:

```
Cubes Numbers Squares
0 1 1 1
1 8 2 4
2 27 3 9
3 64 4 16
```

So by using "rb" ( read binary ), you can restore the data saved in the exponents.pkl back into the table format.

Milestone: Plotting Graphs with Matplotlib

Matplotlib is a python library which you can use to get 2D plots or represent data on plots. It produces good quality images in a variety of formats and interactive environments.

You can generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with a few lines of code.

Showing data in such plots helps users to understand data. In machine learning, plots like these help you determine the outcome of algorithms.

**Installation:**

Matplotlib and most of its dependencies are all available as wheel packages for macOS, Windows and Linux distributions.

You can install using pip on windows devices and xcode-select --install on macOS.

Milestone: Plotting Graphs with Matplotlib

Now that we know what matplotlib is used for, we need to get data the data ready to execute in it.

Data can be of many types, user-defined or a data set which is already defined. So looking at data in general, it can be 1D or 2D or something such as an image. All these are data we can plot using matplotlib.

Lets look at some examples of 1 dimentional data.

```
import numpy as np
x = np.linspace(0,1,100)
sinx = np.sin(x)
cosx = np.cos(x)
print(sinx)
```

Output:

```
[ 0. 0.01010084 0.02020065 0.03029839 0.04039305 0.05048358
0.06056897 0.07064817 0.08072016 0.09078392 0.10083842 0.11088263
0.12091552 0.13093608 0.14094328 0.1509361 0.16091352 0.17087452
0.18081808 0.1907432 0.20064886 0.21053404 0.22039774 0.23023896
0.24005668 0.24984992 0.25961766 0.26935891 0.27907268 0.28875797
0.2984138 0.30803919 0.31763315 0.3271947 0.33672286 0.34621667
0.35567516 0.36509735 0.3744823 0.38382904 0.39313661 0.40240408
0.41163048 0.42081489 0.42995636 0.43905397 0.44810678 0.45711386
0.46607431 0.47498721 0.48385164 0.49266671 0.5014315 0.51014514
0.51880673 0.52741539 0.53597023 0.54447039 0.55291499 0.56130318
0.56963411 0.57790691 0.58612075 0.59427479 0.60236819 0.61040014
0.6183698 0.62627638 0.63411905 0.64189703 0.64960951 0.65725572
0.66483486 0.67234618 0.67978889 0.68716224 0.69446549 0.70169788
0.70885867 0.71594714 0.72296256 0.72990422 0.73677141 0.74356342
0.75027957 0.75691917 0.76348154 0.76996601 0.77637192 0.78269862
0.78894546 0.79511181 0.80119703 0.8072005 0.81312162 0.81895978
0.82471437 0.83038482 0.83597055 0.84147098]
```

The above data is a 1D data, represented as a 1D list. The data is very large and plotting it would make it a lot easier to analyse the variation in the data. As we have used the sine function, the data will represent a sine curve.

Looking at 2D data

```
data1 = np.linspace(0,1,50)
data2 = np.array([data1**2])
```

You can represent this data on matplotlib in relation to each other. data2 contains the square of elements of data1. You can represent this on a 2D plot where elements of data2 get plotted with respect to the elements of data1.

When data is an image

```
from matplotlib.cbook import get_sample_data
img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))
print(img)
```

Output:

```
array([[ 5.93115274e-06, 2.34581641e-05, 7.22562324e-05,
1.73333691e-04, 3.23829967e-04, 4.71169822e-04,
5.33905355e-04, 4.71169822e-04, 3.23829967e-04,
1.73333691e-04, 7.22562324e-05, 2.34581641e-05,
5.93115274e-06, 1.16791322e-06, 1.79105293e-07],
[ 3.86759742e-05, 1.52966445e-04, 4.71169822e-04,
1.13027764e-03, 2.11163664e-03, 3.07241318e-03,
3.48150024e-03, 3.07241318e-03, 2.11163664e-03,
1.13027764e-03, 4.71169822e-04, 1.52966445e-04,
3.86759742e-05, 7.61575086e-06, 1.16791322e-06],
[ 1.96412803e-04, 7.76827707e-04, 2.39279779e-03,
5.74002351e-03, 1.07237757e-02, 1.56030016e-02,
1.76805171e-02, 1.56030016e-02, 1.07237757e-02,
5.74002351e-03, 2.39279779e-03, 7.76827707e-04,
1.96412803e-04, 3.86759742e-05, 5.93115273e-06],
[ 7.76827706e-04, 3.07241318e-03, 9.46369884e-03,
2.27022333e-02, 4.24133557e-02, 6.17110684e-02,
6.99278017e-02, 6.17110684e-02, 4.24133557e-02,
2.27022333e-02, 9.46369881e-03, 3.07241315e-03,
7.76827687e-04, 1.52966433e-04, 2.34581575e-05],
[ 2.39279687e-03, 9.46369673e-03, 2.91502403e-02,
6.99277936e-02, 1.30642320e-01, 1.90083453e-01,
2.15392767e-01, 1.90083442e-01, 1.30642301e-01,
6.99277711e-02, 2.91502188e-02, 9.46367925e-03,
2.39278451e-03, 4.71161763e-04, 7.22518585e-05],
[ 5.73979760e-03, 2.27017136e-02, 6.99267314e-02,
1.67746104e-01, 3.13391413e-01, 4.55981750e-01,
5.16694117e-01, 4.55979066e-01, 3.13386756e-01,
1.67740595e-01, 6.99214693e-02, 2.26974368e-02,
5.73677235e-03, 1.12830572e-03, 1.72263432e-04],
[ 1.07034407e-02, 4.23665653e-02, 1.30545991e-01,
3.13217157e-01, 5.85205655e-01, 8.51463249e-01,
9.64753502e-01, 8.51221618e-01, 5.84786441e-01,
3.12721261e-01, 1.30072309e-01, 4.19815824e-02,
1.04311156e-02, 1.93412933e-03, 2.27488324e-04],
[ 1.49295978e-02, 6.01615826e-02, 1.86893076e-01,
4.50108312e-01, 8.42203451e-01, 1.22520158e+00,
1.38566084e+00, 1.21719987e+00, 8.28320999e-01,
4.33686476e-01, 1.71206871e-01, 4.74127008e-02,
5.91143107e-03, -2.80582148e-03, -2.71922723e-03],
[ 9.47677979e-03, 5.10511999e-02, 1.76525800e-01,
4.45088891e-01, 8.47256027e-01, 1.23034767e+00,
1.36158534e+00, 1.13286697e+00, 6.78133136e-01,
2.45029980e-01, -1.45712946e-02, -1.04261975e-01,
-1.00386982e-01, -6.81300580e-02, -3.83330875e-02],
[ -2.11635983e-02, -2.28879916e-02, 1.58936955e-02,
1.35045808e-01, 3.22753201e-01, 4.58835523e-01,
3.73909905e-01, 2.19573694e-02, -4.35203013e-01,
-7.61556027e-01, -8.40544065e-01, -7.18953352e-01,
-5.13538820e-01, -3.17868325e-01, -1.73718607e-01],
[ -4.98940996e-02, -9.70669141e-02, -1.56548058e-01,
-2.15747157e-01, -2.86909061e-01, -4.35203013e-01,
-7.33894018e-01, -1.15549332e+00, -1.53656759e+00,
-1.69399367e+00, -1.56857521e+00, -1.24468468e+00,
-8.61683600e-01, -5.27030185e-01, -2.86866561e-01],
[ -3.10265764e-02, -6.18968267e-02, -1.04261975e-01,
-1.53192662e-01, -2.15747157e-01, -3.24677874e-01,
-5.13927088e-01, -7.61556027e-01, -9.73703370e-01,
-1.04979450e+00, -9.60699736e-01, -7.57962187e-01,
-5.23401798e-01, -3.19810460e-01, -1.74016443e-01],
[ -5.81093954e-03, -9.41290296e-03, -9.71674822e-03,
-1.68375655e-03, 1.25748335e-02, 1.58936955e-02,
-1.45712946e-02, -8.15869968e-02, -1.56548058e-01,
-2.01742668e-01, -2.00813843e-01, -1.64726078e-01,
-1.15674702e-01, -7.11403884e-02, -3.87947366e-02],
[ 1.03423940e-04, 1.52292735e-03, 6.27330179e-03,
1.68239987e-02, 3.27217852e-02, 4.74127008e-02,
5.10511999e-02, 3.94109983e-02, 1.88393329e-02,
4.02163249e-04, -9.41290296e-03, -1.12259544e-02,
-8.91474281e-03, -5.72526822e-03, -3.16693889e-03],
[ 1.76077772e-04, 7.30037289e-04, 2.29645615e-03,
5.56251620e-03, 1.04311156e-02, 1.51712283e-02,
1.71104931e-02, 1.49295978e-02, 1.00119019e-02,
5.06661974e-03, 1.82277381e-03, 3.45054418e-04,
-9.62472675e-05, -1.38831332e-04, -9.04104904e-05]])
```

Here axes_grid/bivariate_normal.npy refers to an image. You can pass this to a variable which now holds the data of the image. You can display it using the numpy data in the img variable.

Milestone: Plotting Graphs with Matplotlib

Now, you have the data ready to plot, but we must first import the matplotlib library to get going.

```
import matplotlib.pyplot as plt
```

Now we can use the dot function on **plt** to access matplotlib's functions.

You can plot a simple plot using **matplotlib.plot('argument')**. By using this, we end up with a simple plot of our data.

```
plt.plot([1,2,3,4])
```

Output:

figure() is a function in matplotlib , it is the top level container for all the plot elements.

**Subplots:**

All plotting is done with respect to Axes. In most cases, a subplot will fit your needs.

The grid system looks like this,

Legend is the a table which shows you the property of each line that gets plotted on the graph.

To get 4 subplots we will have to define the location for the plot on the grid.

There are 2x2 plots, so the location of the subplots will be,

- subplot 1 = (1, 1, 1)
- subplot 2 = (1, 2, 2)
- subplot 3 = (2, 1, 3)
- subplot 4 = (2, 2, 4)

The tuple representation like (a,b,c) for each plot actually shows row-col-num.

So when using multiple plots, putting them all in a grid like this is good practice. This also makes analyzing related data easy.

Milestone: Plotting Graphs with Matplotlib

Every time you want to plot a set of data, you should follow a systematic approach.

These routines will provide clarity when assessing the plots. Some important routines are listed below.

For 1D and 2D data these are some of the methods that are used.

```
x = np.linspace(0,1,10)
y = np.array(x**2)
lines = plt.plot(x,y)
```

This is a simple line plot which represents relation between x and y axis and each point it joined with the next consecutive point with a straight line.

```
x = np.linspace(0,1,10)
y = np.array(x**2)
scatterplot = plt.scatter(x,y)
```

Scatter plot shows each point at which a value has been calculated.

```
fig = plt.figure()
fig.add_axes()
fig,axes = plt.subplots(nrows=2,ncols=2)
axes[0,0].bar([1,2,3],[100,200,300])
axes[1,0].barh([1,2,3],[100,200,300])
```

Bar charts are either vertical or horizontal as shown. bar() is for vertical plot and barh() is for horizontal plots.

```
fig = plt.figure()
fig.add_axes()
fig,axes = plt.subplots(nrows=2,ncols=2)
axes[1,1].axvline(0.5)
axes[0,0].axhline(0.5)
```

axvline() is "axis vertical line" and is parallel to the y-axis and similarly axhline() is "axis horizontal line" and parallel to x-axis.

```
fig = plt.figure()
ax = fig.add_subplot(2,2,1)
fig.add_axes()
fig,axes = plt.subplots(nrows=2,ncols=2)
ax.fill(x,y,color="red")
```

Color plots are useful when multiple data are being plotted on the same plot, helps separate data.

```
img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))
imgage = ax.imshow(img,cmap="gist_earth",interpolation="nearest",vmin=-2,vmax=2)
```

When an image data is plotted using imshow() , that data containing multidimentional array is converted to an image.

```
fig = plt.figure()
fig.add_axes()
fig,axes = plt.subplots(nrows=2,ncols=2)
y = np.sin(np.linspace(0,1,100))
axes[0,0].hist(y)
axes[0,1].boxplot(y)
axes[1,0].violinplot(y)
```

Milestone: Plotting Graphs with Matplotlib

You can customize a plot using matplotlib library, this includes changing colour, markers, line styles, texts, legends and layouts. All of these can be indicated by the programmer for easy differentiation between multiple data.

**Color, color bars and color maps**

Adding color to plots is very important when there is 2 or more data being plotted in the same plot, doing this will help us identify things like intersections.

These are some of the methods used:

```
fig = plt.figure()
fig.add_axes()
fig,axes = plt.subplots(nrows=2,ncols=2)
img = np.load(get_sample_data('axes_grid/bivariate_normal.npy'))
x = np.cos(np.linspace(0,1,100))
y = np.sin(np.linspace(0,1,100))
axes[0,0].plot(x, x, x, x**2, x, x**3)
axes[0,1].plot(x, y, alpha = 0.9)
axes[1,0].plot(x, y, c='k')
im = axes[1,1].imshow(img,cmap='seismic')
fig.colorbar(im, orientation='horizontal')
```

**Markers and linestyles with text annotations:**

```
fig, ax = plt.subplots(nrows=3,ncols=2)
x = np.arange(20)
y = np.arange(20)
ax[0,0].scatter(x,y,marker=".")
ax[0,0].text(1,22,"example graphs")
ax[0,1].plot(x,y,marker="^")
ax[1,0].plot(x,y,linewidth=4.0)
ax[1,1].plot(x,y,ls='solid')
ax[2,0].plot(x,y,ls='--')
ax[2,0].text(1,-10,"dashed line graphs")
ax[2,1].plot(x,y,'--',x**2,y**2,'-.')
#ax[2,1].plot(a,color='r',linewidth=4.0)
```

If not color, using different symbols to plot different data must be practiced.

These are some of the things which can be used to plot, there are many more sumbols which can be used, most of them are special characters.

Milestone: Plotting Graphs with Matplotlib

Saving a plot can be done using matpllotlib, it is saved as an image file (.png).

```
plt.savefig("filename.png")
plt.savefig("filename.png",transparent=False)
```

We have to use the function savefig() and specify the name of the file we want to save it as. This will save the plot in the same directory where the program is running.

The argument transparent if True, the background will be transparent. For example, this is useful for displaying a plot on top of a coloured background on a web page.

To view a plot, you have to use the function show(). This will display the plot on a separate window. This is not necessary when using editors like the jupyter notebook.

`plt.show()`