In machine learning, there are various algorithms and modules that work differently. One of the popular predictive algorithms is boosting algorithms which work by building weak learners that combine together to give a strong learner. In other words, boosting algorithms create many models and each of the models tries to reduce the errors made by the previous models. Now, the learning rate defines how big steps the model should take to reduce errors.
The learning rate can vary from model to model. The higher value for the learning rate in machine learning means taking large steps to reduce the error. This may cause the overfitting of the model. While a small value in the learning rate can cause the underfitting of the mode. So, it is important to have an optimum value for the learning rate. In this article, we will discuss the learning rate in machine learning and how we can find the optimum value for the learning rate.
What is the Learning Rate in Machine Learning?
A learning rate in machine learning is actually the step sizes that the model needs to take to reduce the loss or errors in the predictions. A large value of learning rate means the model will quickly learn about the errors and overcome them but it can cause an overfitting of the model. A small value of the learning rate will make the model lazy to learn and overcomes the error which can cause an under-fitted model. So, it is always a good practice to have a model with the optimum value of the learning rate.
The optimum value for the learning rate can be found using various methods and here we will go through the GridSearchCV method to find the optimum value.
Visualize learning rate
Let us now try to understand how the learning rate actually affects the model. We assume that we have a dataset with actual and predicted values of the weak learner. Let us visualize the actual and predicted values of the weak model.
# acutal values actual = [3000, 2000, 3300, 1800, 3600, 2400] first_iteration =[2683,2683,2683,2683,2683,2683 ] # importing the required modules import matplotlib.pyplot as plt # actual values plt.plot([i for i in range(len(actual))], actual, label='actual' , c='m') # predicted values plt.plot([i for i in range(len(actual))], first_iteration, label='predicted') plt.legend() plt.show()
Now the boosting model will try to find the errors between the predictions and actual values and then create another model that will be more accurate than the first one.
One of the most important parameters of boosting algorithms is the learning rate which is simply the step sizes to get the optimum solution. In this case, we will use 0.4 as the learning rate. So, now the algorithm will use the previous predictions ( 2683) and combine them with learning rate and error to come up with a new prediction. It uses the following formula to calculate the next predictions.
previous predictions – (learning rate) * ( error)
For example, the prediction of the first value in the second weak learner will be:
2683 – ( 0.4) * (-317)
The model will use this formula to find the predictions and here is how the next couple of predictions
As you can see, the second weak learner has reduced the errors as compared to the first one.
How to Find the Optimum Value for Learning Rate?
There are many methods that can be used to find the optimum value for the learning rate in machine learning. One of the most popular methods is GridSearchCV which is actually a parameter-tuning method.
In GridSearchCV, we have to specify the possible values for the learning rate and then it will go through each of these values and will return the one which gives the optimum result.
We will assume that we are using a gradient-boosting classifier.
# importing the modules from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV # defiing the model model = GradientBoostingClassifier() # values for learning rate grid['learning_rate'] = [0.0001, 0.001, 0.01, 0.1, 1.0]
Now, we will call the model to train on each of the specified values.
# defining the cv cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3) # applying the gridsearchcv method grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy') # storing the values grid_result = grid_search.fit(Input, Output) # printing the best parameters of Hyperparameter Tuning of Gradient Boosting Algorithm print("Accuracy score: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
This will return the optimum value for the learning rate.
The learning rate is an important parameter, especially for the boosting algorithms which specify the size of steps to be taken in order to reduce the errors. In this short article, we discussed what is learning rate in machine learning is and how to find the optimum value using GridSearchCv.