Squeezing a few extra accuracy points out of your machine learning model can boil to down to finding the best value for the model’s parameters. In fact, in some cases, correctly setting the model’s parameters can transform an average performing model into a world class one.
What are parameters and how can I adjust them:
Parameters are values that help determine how well a machine learning model generalizes off of the data it is fed. That is, they allow the model to make the best prediction it can, when encountering a new datapoint. Adjusting the parameters depends on the model you employ. For example, the parameter for Scikit-Learn’s Logistic Regression and Support Vector Machine’s (SVC) is characterized as C. For some regression models, such as Ridge Regression, the parameter is characterized as alpha:
Use Grid Search to Find the Best Parameter Setting
We can see that when we instantiate our model, we can adjust its parameters by specifying the value for C or alpha — its default value is typically 1.0. But how do you find the best value for C and alpha? Grid Search answers this question. Let’s see how to do this:
We’ll use the Breast Cancer data set for the tutorial. Let’s import the following packages:
Then we’ll load the dataset and store in it in the variable, breast_cancer:
Above are the variables used to predict whether a patient has malignant or benign cancer.
We then create a model without using grid search, as a first pass. First, we split the data into training and test sets. Then we instantiate and fit our model with the training set. Lastly, we assess the model’s accuracy:
The model’s accuracy — r-squared score —is very good to begin with, but let’s see if we can improve it with grid search. to use grid search, you’ll need to specify a series of parameter values, which usually consist of 0.001, 0.01, 0.1, 1, 10, and 100:
Next, we instantiate our grid search model and fit it with the training data:
By choosing the best value for the parameter, C, grid search helped us make our model even more accurate!
Lastly, the value that C or another parameter takes can positively or negatively impact a model’s accuracy, so it is important to pick the right value. You can do this manually by testing individuals values, but the beauty of grid search is that it does this for you:
I hope this has been helpful. Until next time!