# Concept of Overfitting

Demonstration of overfitting using an Polynomial Linear Regression example

Objective

To understand the concept of Overfitting using Linear Regression with Polynomial Features.

So let’s first understand What is Regression?

Have you ever thought of How can we predict price of a house or a car using Machine Learning ? Well, Regression technique is used.

Regression is used to predict a continuous value. Some of the commons Regression techniques are -

1. Simple Linear Regression

2. Multiple Linear Regression

3. Polynomial Linear Regression

Now let’s understand what is Overfitting briefly.

Let’s suppose we have a created a model & we want to check how well our model works on unseen data. Sometimes our model performs poor due to Overfitting or Underfitting.

When a model gives high accuracy on train dataset but performs poor on unseen dataset, then we call it as Overfitted model.

Underfitting is when a model performs poor on training dataset. underfitted models are unable to find relationship between input & target.

In this article, we will learn Overfitting Concept with Linear Regression with Polynomial Features.

Let’s start

we will create 20 random `uniform distributed` values & then we will use `sin` function to predict. we will work on [0, 1, 3, 9] order linear regression

Now we have our dataset X, y, let’s draw it.

Let’s divide our dataset into train & test dataset using sklearn.

Now lets define our model & plot graph for degrees `0, 1, 3, 9`

we will get graphs like this.

Let’s Display weights in tabular form

we have trained out model. It’s estimate our model. Let’s calcualte the train & test error.

`[0.208575632499395, 0.20178321640091842, 0.15247400351622362, 0.10418786631408623, 0.09688701939648986, 0.09263963531131172, 0.08283677775295668, 0.06327629715761585, 0.06147112825631159] [0.7076444970946971, 0.7023260466949512, 1.7102279649595118, 8.775219946391115, 26.828407071463392, 117.45559764444442, 2039.7780917210393, 68181.7212153289, 714896.6962217717]`

we can see the train & test error through this graph. as we can see, test error is huge, it means our model is overfitted.

So How to prevent this Overfitting ?

Overfitting can be prevented by

1. Increasing Dataset
2. Regularisation

Increasing Dataset

`# divide the new dataset into train & testX_train_new, y_train_new, X_test_new, y_test_new = train_test_split(X_new, y_new, test_size=0.5)`

Regularisation

the next approach to minimize loss is using Regularization technique.

In simple words, Regularization is used to prevent overfitting.

There are many types of regularization. we will use L2 Regularisation also called as Ridge Regularisation.

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where independent variables are highly correlated. It has uses in fields including econometrics, chemistry, and engineering

# Best Model (According to test performance)

as from the `Train error using L2` & `Test error using L2` graph, for each lambda values, our train error is almost the same but test error differs. we can see for lambda = 1, there is some test error whereas for lambda = 1/100000, test error is huge. so according to this graph `lambda = 1/100` is a best model.

# My Contribution

I went thorugh various tutorials, understood code & implemented this on my own. added data points & experimented with multiple degrees as well as captured train & test error. Also plotted the graphs.

# Challanges

The first challange was to fit model with many degrees, used `pipeline` module from `sklearn` to fix this.

Next was to prevent overfitting, `Increased data` & used `L2 Regularisation` to fix this.

# Experiments & Finding

Experiment tried with many (1/1000000, 1/10000000) lambdas values to see wheather train & test error increase or decrease.

Finding — as we can see, for more lambda values `test error` is getting increased.

# What’s Next

Ensemble Technique

you can read it here more. find the notebook.

# References

https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html

https://datascience.foundation/sciencewhitepaper/underfitting-and-overfitting-in-machine-learning

https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html

https://medium.com/@minions.k/ridge-regression-l1-regularization-method-31b6bc03cbf