Give Your Machine Learning Model a Boost with Grid Search

Image for post
Image for post

Squeezing a few extra accuracy points out of your machine learning model can boil to down to finding the best value for the model’s parameters. In fact, in some cases, correctly setting the model’s parameters can transform an average performing model into a world class one.

What are parameters and how can I adjust them:

Parameters are values that help determine how well a machine learning model generalizes off of the data it is fed. That is, they allow the model to make the best prediction it can, when encountering a new datapoint. Adjusting the parameters depends on the model you employ. For example, the parameter for Scikit-Learn’s Logistic Regression and Support Vector Machine’s (SVC) is characterized as C. For some regression models, such as Ridge Regression, the parameter is characterized as alpha:

Image for post
Image for post
Image for post
Image for post

Use Grid Search to Find the Best Parameter Setting

We can see that when we instantiate our model, we can adjust its parameters by specifying the value for C or alpha — its default value is typically 1.0. But how do you find the best value for C and alpha? Grid Search answers this question. Let’s see how to do this:

Tutorial:

We’ll use the Breast Cancer data set for the tutorial. Let’s import the following packages:

Image for post
Image for post

Then we’ll load the dataset and store in it in the variable, breast_cancer:

Image for post
Image for post

Above are the variables used to predict whether a patient has malignant or benign cancer.

We then create a model without using grid search, as a first pass. First, we split the data into training and test sets. Then we instantiate and fit our model with the training set. Lastly, we assess the model’s accuracy:

Image for post
Image for post

The model’s accuracy — r-squared score —is very good to begin with, but let’s see if we can improve it with grid search. to use grid search, you’ll need to specify a series of parameter values, which usually consist of 0.001, 0.01, 0.1, 1, 10, and 100:

Image for post
Image for post

Next, we instantiate our grid search model and fit it with the training data:

Image for post
Image for post

By choosing the best value for the parameter, C, grid search helped us make our model even more accurate!

Lastly, the value that C or another parameter takes can positively or negatively impact a model’s accuracy, so it is important to pick the right value. You can do this manually by testing individuals values, but the beauty of grid search is that it does this for you:

Image for post
Image for post

I hope this has been helpful. Until next time!

Stumbled into a data-centric role several years ago and have not looked back! Passionate about leveraging technology to uncover answers and improve the world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store