Bias Variance Trade-off
Is all you need to care about as a Machine Learning engineer.
A good handshake between Bias and Variance. — Tapan
Hi there, I am Tapan here. I will be sharing my knowledge of the bias-variance trade-off here. As you guys might have already heard of this term while learning Machine learning.
This is arguably a very important topic and this is a sure shot interview question :). So let's start understanding this.
What is this and when this comes into the picture?
Once after the model got trained bias-variance comes into the picture. These are otherwise called as prediction errors also. This prediction error understanding could make your model more accurate and will avoid it from overfit and underfit.
What is Bias ?
Bias is the difference between Actual value and the average of the predicted value by the trained model. High Bias can cause more assumption errors between actual and predicted values. if the bias is high it tends to create an underfit situation in your model. Low Bias is just the opposite, it less gives assumption error and it’s good until it gets overfit.
What is Variance ?
Variance simply means its the spread of data and for this context in here once the model got trained how the variability of data could be predicted by your model is your variance.
A simpler way to understand?
I will give some references from our day to day life or from any books to understand the concept very easily.
Think like always there is 4 kind of students are giving exam such as
- One guy is super awesome he knows the answers of the exam and he is not giving any wrong answers to avoid -ve markings. i.e Low-bias and Low-variance
- The second guy is attempting all the questions but not sure about the answers sort of Overcommitting. i.e Low-bias and High-variance. otherwise called overfitting.
- The third guy is just guessing the answers in that exam and filing the answer sheet. he is more like giving more answers so he might get a higher chance to obtain higher marks. but as he doesn’t know the answers he has higher errors. i.e High-bias and Low-variance otherwise called overfitting..
- The Fourth guy doesn’t have any clue on the exam. He is just appearing for some reason. i.e High-variance and High bias.
What models tend to where?
As you can see in this picture Decision Tree, K nearest neighbors, and SVM tend to overfit.
Linear regression and logistic regression have high bias and less variance tends to underfit.
What are overfitting and underfitting?
This occurs when your model tried to consider all features without any error. It creates a problem because the overfitting model only works with the trained data and doesn’t work with the new data.
To overcome this situation we use regularizers to less overfit to the model.
This occurs when your model doesn’t care about the features and it creates an unrealistic model.
To overcome this situation we make the model a little complex to understand more about the feature behaviors.
So we should always look forward to fewer errors in the model.
A good balance of bias-variance could result in a good model. That’s why this tiny concept is more important.
Thanks for reading up to here. Please give a clap if you like this article.