Neural Networks Theory Artificial Neurons and Architecture

What is Neural Networks?

Imagine you have a very hard mathematical problem that requires you to create a function that predicts outcomes from a set of input data and all you had to do was give some black box the data and on the other side of the black box comes your answer. Neural Networks does exactly this.

Neural Networks was created as a way to help learn how the human brain works, unfortunately, due to the autonomous nature of neural networks it has probably brought people even further away from understanding how the human brain works rather then closer.

Neural Network Architecture

A Neural Network architecture is made up of 5 main elements. Input layer, Output layer, hidden layer(s), weights of edges connecting input layer/neurons from 1 layer to the next layer and biases for each artificial neuron in the network.

The neural network algorithm will determine the weights and bias parameters so other then initializing these parameters at the very beginning there is not much you can do with these parameters. When running a neural network model you will need to make a decision on the input layer, output layer, and the hidden layers (for some flavours of neural networks you can also choose whether it is a fully connected neural network or not) also known as the network architecture.

The input layer is the layer where you first pass in input from your training/test data. This layer is problem specific, for example, if your input is the pixels of a image each neuron in the input layer could represent 1 pixel of the image in grey scale, rgb, etc. The quality of your data is very important to your model. Rubbish in rubbish out so it is worth spending time to make sure the quality of your data is of a high standard. 

The output layer is the final output as a result of passing in your input data through the network, you can choose whether to have multiple neurons in the output layer (multi class classificiation) or a single neuron in the output layer that represents solving a regression problem.

The hidden layers are the layers in between the input and output layer. Your choice of how many layers in the hidden layer and how many neurons for each layer will effect the performance of your neural network model.

As with any machine learning algorithm there will be a bias/variance trade off when selecting these parameters for your neural network. By adding more hidden layers you might reduce the bias but increase the variance as your model fits the training data almost perfectly but fails to generalise to unseen data (over fitting). Alternatively, if you reduce the number of hidden layers you might reduce the variance but increase the bias. An extreme case is to always predict the same output for all observations in training and test data where you significantly under-fit the data. 

Perceptrons

The fundamental element of Neural Networks are artificial neurons. A perceptron is 1 kind of artificial neuron that takes in and outputs a binary value (1 or 0). Neural Networks can use other kinds of artificial neurons, but to begin lets illustrate some ideas with a perceptron. One example is below:

    w1
x1-------\
w2
x2-------- (Perceptron(bias)) -----> Output
w3
x3-------/

Above is an illustration of a perceptron that takes in 3 input variables (x1, x2, x3). The perceptron contains a bias variable (will get into this further) that is independent for each perceptron. Each branch connecting the input variable to the perceptron contains a weight (w1,w2,w3) to indicate how significant the input is. Finally the perceptron has an output.

The output of the perceptron can be defined by the following piecewise function:

\(\begin{eqnarray} \mbox{output} & = & \left\{ \begin{array}{ll} 0 & \mbox{if } \sum_j w_j x_j \leq \mbox{ bias} \\ 1 & \mbox{if } \sum_j w_j x_j > \mbox{ bias} \end{array} \right. \tag{1}\end{eqnarray}\)

Lets try and break this down some more. The $$\sum_j w_j x_j$$ term is basicaly a weighted average of the input variables. The function then defines that if the weighted average is less than or equal the bias of the perceptron then the output is 0, otherwise, if the weighted average of the inputs is greater than the bias output a 1. Note that it is convention to re-write the function as 

\(\begin{eqnarray} \mbox{output} & = & \left\{ \begin{array}{ll} 0 & \mbox{if } \sum_j w_j x_j +\mbox{ bias} \leq  0\\ 1 & \mbox{if } \sum_j w_j x_j +\mbox{ bias} >  0\end{array} \right. \tag{2}\end{eqnarray}\)

Take a minute to take that in. Shouldn't the bias be negative? Sure, but that parameter is an unknown and it could be negative or positive, regardless, this formulae will work.

Recall previously how neural networks have a bias parameter for each neuron. The piecewise function above is now taking shape into what can be used for our activation function for each neuron (output of the neuron to be passed to the next layer).  

Logic using Perceptrons

Using this idea from above, we can use perceptrons and combinations of weights and biases to implement some logic. Let us use the architecture below:

    w1
x1-------\

(Perceptron(bias)) -----> Output
w2
x2-------/

Let's make w1 = 1, w2 = 2 and bias = -2. Let us calculate the output using various combinations of values for x1 and x2

x1 x2 Output
0 0 0
0 1 0
1 0 0
1 1 1

This should look quite familiar as it is the truth table for logical AND.

You can also create truth tables for other logical operators, try some other values and see if you can replicate OR, NAND, XOR etc.

Sigmoid Neurons

Perceptrons are useful for solving certain kinds of problems, but often we want to solve problems and be able to adjust the weights and biases parameters slightly without getting a very different result in the output layer. 

The reason why perceptrons are not ideal for these kinds of problems is because a slight change in the weights/bias parameters will either switch the output from a 0 or a 1 and nothing in between. This could potentially lead to very different results in the output layer for a similar neural network model with slightly different weights and bias parameters due to perturbations. Additionally, sometimes we want the ouput to be a real number between 0 and 1 and not just binary.

To achieve this we can use sigmoid neurons. The sigmoid neuron works the exact same way as a perceptron, except the activation function becomes: 

\(\sigma(w \cdot x+b)\) where \(\sigma(z)\) is the sigmoid function and is defined by:

 \begin{eqnarray} \sigma(z) \equiv \frac{1}{1+e^{-z}}. \tag{4}\end{eqnarray}

Combining all of this together you will get 

\(\begin{eqnarray} \frac{1}{1+\exp(-\sum_j w_j x_j-b)}. \tag{5}\end{eqnarray}\)

where j is the number of neurons in the input/previous layer. Don't worry we will define a set of standard notation. Remember that in this case we are still working at the single neuron level.

Conclusion

This concludes part 1 of neural networks. It is just outlining the basic fundamental blocks of the neural network model. Please go onto my next post where I will talk about using an appropriate cost function that can be optimised via an algorithm called gradient descent in order to come up with an optimal set of weights and biases for your trained neural network model. Please find the link here: Neural Networks Theory Part 2

Subscribe to our mailing list

* indicates required