SVM Parameter Tuning in Scikit Learn using GridSearchCV

Aneesha Bakharia
2 min readJan 18, 2016

--

Update: Neptune.ai has a great guide on hyperparameter tuning with Python.

Recently I’ve seen a number of examples of a Support Vector Machine algorithm being used without parameter tuning, where a Naive Bayes algorithm was shown to achieve better results. While I don’t doubt that a simpler model produced by Naive Bayes might be better at generalising to held-out data, I’ve only ever been able to achieve good results with an SVM by first performing parameter tuning. There is really no excuse not to perform parameter tuning especially in Scikit Learn because GridSearchCV takes care of all the hard work — it just needs some patience to let it do the magic.

Before trying any form of parameter tuning I first suggest getting an understanding of the available parameters and their role in altering the decision boundary (in classification examples). There are two parameters for an RBF kernel SVM namely C and gamma. There is a great SVM interactive demo in javascript (made by Andrej Karpathy) that lets you add data points; adjust the C and gamma params; and visualise the impact on the decision boundary. I suggest using an interactive tool to get a feel of the available parameters.

You don’t need to use GridSearchCV and can write all the required code manually. Without GridSearchCV you would need to loop over the parameters and then run all the combinations of parameters. If you were then after a cross-validated result, you would also need to add the code to find the best average CV results across all the combinations of parameters. Rather than doing all this coding I suggest you just use GridSearchCV.

Using GridSearchCV is easy. You just need to import GridSearchCV from sklearn.grid_search, setup a parameter grid (using multiples of 10’s is a good place to start) and then pass the algorithm, parameter grid and number of cross validations to the GridSearchCV method. An example method that returns the best parameters for C and gamma is shown below:

from sklearn import svm, grid_searchdef svc_param_selection(X, y, nfolds):
Cs = [0.001, 0.01, 0.1, 1, 10]
gammas = [0.001, 0.01, 0.1, 1]
param_grid = {'C': Cs, 'gamma' : gammas}
grid_search = GridSearchCV(svm.SVC(kernel='rbf'), param_grid, cv=nfolds)
grid_search.fit(X, y)
grid_search.best_params_
return grid_search.best_params_

The parameter grid can also include the kernel eg Linear or RBF as illustrated in the Scikit Learn documentation.

One last thing — please always remember to include the parameters you selected in your publications, blog posts, etc ….. It just makes for reproducible research!

--

--

Aneesha Bakharia
Aneesha Bakharia

Written by Aneesha Bakharia

Data Science, Topic Modelling, Deep Learning, Algorithm Usability and Interpretation, Learning Analytics, Electronics — Brisbane, Australia