In this article, we will cover key theoretical topics and aspects surrounding the Neural Network model i.e. things like:
Understanding a high-level overview of these key elements is really going to make it much easier to understand what’s happening when we begin to use TensorFlow.
Before we launch straight into Neural Networks, we need to understand the individual components first, such as a single “neuron”. Artificial Neural Networks (ANN), have a basis in Biology! Let’s see how we can attempt to mimic biological neurons with an artificial neuron, known as Perceptron. And then once we go through the process of how a simple perceptron works, we’ll see how we can represent that mathematically.
Let’s start off with a Biological neuron, such as a brain cell. So the Biological neuron works in a simplified way through the following manner:
In the above diagram, we can see that we have dendrites that feed into the body of the cell and what happens is these electrical signal gets passed through these dendrites and then later on a single output or a single electrical signal is passed on through an axon to, later on, connect to some other neuron. And that’s the basic idea.
So, Artificial Neural Networks also has its inputs and outputs.
This simple model is known as a Perceptron. In this case, we have two inputs. These inputs can have values of features.
So, when you have your dataset, you are going to have various features and these features can be anything from how many rooms a house has or how dark an image is represented by some sort of pixel amount or some sort of darkness number, etc.
The next step is to have these inputs multiplied by some sort of weight.
So we have Weight 0 for Input 0 and Weight 1 for Input 1. Typically the weights are actually initialized through some sort of random generation. So we just choose a random number for these weights. In this case, we’ll pretend that the random number chosen is 0.5 and -1.
So now these inputs are going to be multiplied by the weights. And that ends up looking like this:
The next step is to take these results and pass them into an Activation Function. An Activation Function calculates a “weighted sum” of its input, adds a bias and then decides whether it should be “fired” or not.
We’ve seen how a single perceptron behaves, now let’s expand this concept to the idea of a neural network. Let’s see how to connect many perceptrons together and then how to represent this mathematically. So, the multiple perceptrons that work actually look like this:
Here we can see that we have various layers of single perceptrons connected to each other through their inputs and outputs. In this case, we have an input layer on the left which is purple. We have two hidden layers and hidden layers are the layers that are in between the input layer and all the way on the right that output layer. Essentially, hidden layers are the layers that don’t get to see outside, i.e, all the way the inputs on the left and all the way the output on the right. When there are three or more hidden layers, it is called a “Deep Network“.
The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold. Or it can be a transformation that maps the input signals into output signals that are needed for the neural network to function.
A binary step function is a threshold-based activation function. If the input value is above or below a certain threshold, the neuron is activated and sends exactly the same signal to the next layer.
The problem with a step function is that it does not allow multi-value outputs—for example, it cannot support classifying the inputs into one of several categories.
A linear activation function takes the form:
A = cx
It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input. In one sense, a linear function is better than a step function because it allows multiple outputs, not just yes and no.
Modern neural network models use non-linear activation functions. They allow the model to create complex mappings between the network’s inputs and outputs, which are essential for learning and modeling complex data, such as images, video, audio, and data sets which are non-linear or have high dimensionality.
Almost any process imaginable can be represented as a functional computation in a neural network, provided that the activation function is non-linear.
Non-linear functions address the problems of a linear activation function:
It is a function that measures the performance of a Machine Learning model for given data. Cost Function quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways. The purpose of the Cost Function is to be either:
To explain Gradient Descent I’ll use the classic mountaineering example.
Suppose you are at the top of a mountain, and you have to reach a lake which is at the lowest point of the mountain (a.k.a valley). A twist is that you are blindfolded and you have zero visibility to see where you are headed. So, what approach will you take to reach the lake?
The best way is to check the ground near you and observe where the land tends to descend. This will give an idea in what direction you should take your first step. If you follow the descending path, it is very likely you would reach the lake.
To represent this graphically, notice the below graph.
Let us now map this scenario in mathematical terms.
Suppose we want to find out the best parameters (θ1) and (θ2) for our learning algorithm. Similar to the analogy above, we see we find similar mountains and valleys when we plot our “cost space”. Cost space is nothing but how our algorithm would perform when we choose a particular value for a parameter.
So on the y-axis, we have the cost J(θ) against our parameters θ1 and θ2 on the x-axis and z-axis respectively. Here, hills are represented by the red regions, which have a high cost, and valleys are represented by the blue region, which has low cost.
Back-propagation is the essence of neural net training. It is the practice of fine-tuning the weights of a neural net based on the error rate (i.e. loss) obtained in the previous epoch (i.e. iteration). Proper tuning of the weights ensures lower error rates, making the model reliable by increasing its generalization.
In this article, we have got a very brief introduction to Neural Networks and how it works. We discussed:
We’ll deal with the coding section in the next article. Till then, stay tuned and Happy Reading!
Start (or continue) your data science journey here with these amazing books. You may have…
Supervised learning is a type of machine learning which deals with regression and classification. When…