A Dive into Supervised Learning for Neural Networks: How do AI Models Learn?
4-5 min read •
At this point in our lives, almost everyone knows how powerful AI has become. Not even 2 or 3 years ago, we couldn't even fathom the idea of a piece of technology being able to creatively and originally generate even a few lines of text, let alone a full essay. Now, such power among AI models is commonplace and ceases to impress us.
However, no one bothers to think about how these AI models work in the first place. How are they so powerful? What logical mechanics cause Artificial Intelligence to come to life and provide us with accurate responses to never-before-seen inputs?
In this post, I will be discussing the logical intricacies of neural networks, the pioneer of AI's seemingly endless power, and how that is being used today to power the AI revolution.
Note that this post discusses supervised learning, a process where the AI is trained on labeled datasets. This means that the AI learns from the answers that we provide to generate its own answers instead of exploring it for itself.
Chat GPT and other AI models use unsupervised learning, which I will discuss in a later blog.
How are AI Models Trained?
AI models are trained repetitively on many given sets of training data.
For example, let's say that we want to build an AI model where, given a greyscale image of a person, the AI will tell us whether the person's expression in the image seems happy, sad, or neutral.
We would start by first gathering hundreds of images of people, and manually categorizing them as happy, sad, or neutral.
We would then feed these data sets to the neural network, which would train itself based on the inputs and outputs to notice patterns within the image.
Once our neural network is sufficiently trained, we can take a completely random image of a person that was not initially in its training data, and assuming the process went well, we should get an accurate estimate on that person's expression.
This is a process known as supervised learning, where an AI judges its own outcomes with the expected outcomes (the ones we manually categorized for the AI) to better its responses.
What are Neural Networks and How do they Learn?
As complicated as neural networks may seem at first glance, the logic behind it is simple and effective. Neural networks are essentially a set of layers of connections between various numbers in an attempt to build a numerical pattern between the input and the expected output.
The Node
Each node in the neural network has 3 parts: an activation value, a weight, and a bias. The activation value is the numerical value that the node currently has, the bias is a value that is added to the activation value, and the weight is a multiplier that each node has that explains how strongly the value needs to be passed to the next node.
It's completely fine if these terms make no sense right now; I will go into depth later on. For now, just know that nodes need to pass on their values to other nodes through alterations.
Neural networks themselves can be broken down into 3 parts: the input layer, the hidden layers, and the output layer.
The Input Layer
The input layer takes in the input, that is, the data we need to train on.
In our case of categorizing people's emotions, the input would be the individual pixels of each image, meaning there would be as many input nodes as there are pixels in the image.
Referring back to our node structure, the activation value for each input node would be the brightness value from 0 - 255 (Remember, the image is greyscale so pixels can range from dark to bright) of each pixel.
The Hidden Layers
The hidden layers are where the bulk of the processing happens. Initially, all of the weights and biases (the number changers) are random numbers, meaning that the output will also be completely random.
The way the hidden layers work are by throwing the input values down the network.
For instance, if an input node had a brightness value of 80, then the next node in the hidden layer would take that 80, multiply it by a certain weight (initially random), and add a bias to it (also random).
So if the weight was 0.7, and the bias was 5, then the next node would have a value of 80 * 0.7 + 5, or 61. That 61 would also be passed down the chain of more weights and biases. The same principle applies to all of the nodes in the hidden layer, passing many more values.
This passing of values is known as forward propagation.
By changing all of these values, we will be left with an output.
The Output Layer
After going through the many weights and biases and traversing the whole neural network, we will be left with our outputs.
We will have 3 possible outputs; the person can either be happy, sad, or neutral. The output layers will take the final values of the hidden layers, condense them on a scale of 0 to 1, (0 being an incorrect output and 1 being a correct output) and return those outputs.
Of course, with initial random values, the outputs will also be random.
For example, for the outputs happy, sad, and neutral, we could have an output of (0.1, 0.9, 0.5).
This means that the model thinks it is not happy (only a score of 0.1 / 1), the model is sure that the person is sad (score of 0.9 / 1), and the model is unsure if the person is neutral (0.5 / 1).
The Learning Process
Now that we have our training data (the categorized images) and the neural network, we can begin the reiterative learning process.
When we start the process, like I mentioned before, all of the weights and biases will be random. Then, with the forward propagation, we will reach our output in the format mentioned above.
However, as our outputs are based on random weights and biases, they would most likely be very arbitrary. Once we have our guesswork numbers, we will compare them to what is actually expected. For instance, we may have achieved an output of (0.2, 0.7, 0.3), whereas the actual output is (1, 0, 0).
This is where the calculus and linear algebra comes in— the neural network takes these expected values and finds which weights and biases to change so that the next time it forward propagates, it can achieve a closer approximation to the intended values. This is a process known as back propagation.
The complex math behind the process of knowing exactly which weights and biases to change, and by how much, exceeds the scope of this post, so I won't be delving into that.
After many iterations of forward and back propagation, that is, continually calculating values and changing the weights and biases accordingly, we would be left with a neural network that can accurately predict the expression of any given person in any given image.
Comments
Post a Comment