Deep Learning simple project-ANN

Santhosh
7 min readDec 29, 2020

--

In this tutorial, we are going to learn about Deep Learning in depth along with one project using ANN.

Introduction

Before learning about Deep Learning, we need to know about some terms. In the IT world , there is a term called Artificial Intelligence which is nothing but Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.

To achieve Artificial Intelligence we can use two methodologies

  1. Machine Learning.

2. Deep Learning - is also a Subset of Machine Learning.

we can see about machine learning in another tutorial, for now, we discuss deep learning.

Fig 1 Overview of AI/ML/DL

Learning About Deep Learning...

When we are using Neural Networks in data processing which is before prediction then it is called Deep Learning. To learn more about deep learning visit: https://www.investopedia.com/terms/d/deep-learning.asp#:~:text=Deep%20learning%20is%20an%20AI,is%20both%20unstructured%20and%20unlabeled.

Neural Networks is just a Layered Structure or Layered Architecture and each layer consists of several Nodes.

There are more number of Layers in Neural Networks, mainly there are three layers in name, they are

  1. Input Layer - is the first Layer in the Neural Networks
  2. Hidden Layer - is the Balance layers except for Input and Output Layers
  3. Output Layer - is the Last Layer in the Neural Networks
Fig 2 Layers of Neural Network

Nodes :

Nodes are also called neurons or perceptrons. Each Layer consist of several nodes, Each Node consists of inputs, outputs, activation function, and inside of the node, there are two terms called Weight and Bias.

The General Equation that relates this

Output(Y) = Activation Function(Input * Weight + Bias)

Fig 3 Example of Node

Now the question arises, how “ we define the number of nodes in those layers? ”

In Input Layers, The number of nodes we have to keep is equal to the number of features in your dataset.

Number of Nodes in input layer = Number of Feature in Data set

In Output Layers,

  1. For Regression Problems (Output we have to predict is numeric) - Only one node is Enough.
  2. For Binary class Classification (The output has only 2 classes, Eg. Dog or Cat) - Only one Node is Enough.
  3. For Multiclass Classification (There are more than 2 classes) - Number of Nodes is Equal to Number of classes. For Example, if there are three classes then we need to put 3 nodes in the output layer.

In Hidden Layers, There are no Number specifications for nodes, but there are two conditions or tips they are

  1. The number of nodes in the input layer approximately equals to

2/3 {Number of Nodes in Input Layer} + {Number of Nodes in Output Layer}

2. The number of nodes in hidden layers is less than or equal to twice the number of nodes in the input layer

Number of Nodes in Hidden Layer <= 2 * Number of Nodes in Input Layer.

Simple Deep Learning Project

Some definitions will be explained with the project, let’s jump into the project.

Here I have taken the dataset which is related to the bank sector. There some columns using that we need to predict whether the customers will continue to account or Drop the account.

First, we are importing the dataset using pandas like in Fig 4.

Fig 4

Next, we need to separate the features and Labels. Features are the one which is given as input and Labels are the ones which we have to predict. That is shown in Fig 5.

Fig 5

In our dataset, there are 14 columns and 10000 records. Among 14 columns 3 columns are not giving any sense so that we slicing the dataset from the third column up to the 13th column for the feature. The Last Column is taken as a Label.

Fig 6

In Fig 6, we are going to learn a new concept called One Hot Encoding. It means for example in our dataset there is a column called Geography and Gender. These two columns are categorical. In deep Learning we need to convert it to Numerical, that work was done by One Hot Encoding.

It takes a unique value in the column which needs to convert to Numerical and makes that unique values as separate columns and gives numerical data.

Fig 7

In figure 7, we drop the one column(Geographical) because after one hot encoding 3 unique values make the 3 columns. In the 3 Columns, just 2 columns are enough. If there are “n” Columns just “n-1” Columns are enough

NOTE : After One hot encoding the “Encoded Values Shows first in the Array”. In Fig 3, Cell 31 the first 3 values are from Geographical column and next 2 are from Gender column.

Fig 8

In figure 8 we just split the data for training and testing. we take 80% data for training and 20% data for testing.

Fig 9

Normalizing our data very important work in deep learning. Sometimes it is also called Feature Scaling. For example, there are two columns one has large values and another column has small values. For better results, we need to either make large values into small or small values to Larger. This process can be done in many ways, here we are using StandardScaler.

Fig 10

Before this, we are in the stage of data pre-processing. Now we are going into the Deep Learning part.

In Fig 10 we are importing the library called Keras, which is a library for deep learning from Google. It works by keeping TensorFlow as a backend.

From Keras, we are importing the model called Sequential which means in order. After that we are importing the layer called Dense, dense is a layer where the output of each node is connected to the input of all nodes in the next layer. An example of Dense is given in Fig 11.

Fig 11 Example of Dense

Next, we are fitting the layers in the model (Sequential), before that we are in the situation to learn about Activation Functions.

Activation Function

Neural network activation functions are a crucial component of deep learning. Activation functions determine the output of a deep learning model, its accuracy, and also the computational efficiency of training a model — which can make or break a large scale neural network. Activation functions also have a major effect on the neural network’s ability to converge and the convergence speed, or in some cases, activation functions might prevent neural networks from converging in the first place. There are lots of function but we are using mostly 3 activation function. They are

Relu - If input(X) is given to the node if X >= 0 then input is the Output, if X < 0 then it gives 0 as Output . Mostly relu is used in Hidden Layers.

Sigmoid - It is a Probability based function , it is used in Binary Class Classification

Softmax - It is used in Multiclass Classification.

To Learn More about the activation function visit https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

Which activation Function is used where?

In Input Layers, there is no need for an activation function, because the input layer is used for just getting inputs.

In Hidden Layers, Relu is mostly used as activation Function

In Output Layer,

  1. In the case of Regression Problems, no activation Function.
  2. In the case of Binary Class Classification, Sigmoid Activation Function is used.
  3. In the case of Multiclass Classification, Softmax Activation Function is used.

In fig 10, we used 2 Hidden Layers and there are 11 features in the data set so that, we use 7 nodes in input layers which are auto-generated in cell 77 by giving “input_dim = 11”. We inserted one output layer.

Fig 12

In Fig 12, we need to compile our model which has an optimizer(An Algorithm) and loss, here we use adam optimizer and for binary classification, we use “binary_crossentropy”. By seeing the image below we can understand the concept.

Fig 13 Back Propagation

In fig 13, input x is given weight is added and predictions in done, the predicted is compared with the true target using Loss function, if loss score is minimum using optimizer (Algorithm) weight is updated and then this process repeats. This process is called Back Propagation.

After this, we going to fit our model, along with batch_size, and Epoch is added.

Fig 14

Finally, prediction is done. The labels_pred shows 2000 because we use test_size = 0.2 in train_test_split.

In Fig 14, using confusion Matrix we compare the predicted data with actual data for accuracy. we got an accuracy of nearly 84% and the validation accuracy is 0.8395.

We can improve the accuracy of our model by the increase or decrease batch_size and Epochs.

Thank You.

--

--