Mini-Batch Gradient Descent
Gradient descent is an optimization method used in machine learning to compute the model parameters (coefficients and bias) for algorithms like logistic regression, neural networks, and linear regression, among others. In this method, the training set is iterated over several times, and the model parameters are updated in line with the gradient of the error relative to the training set. We have three different forms of gradient descents depending on how many training samples were taken into account while updating the model parameters:
Mini-Batch Gradient Descent:
In a variant of the gradient descent process known as mini-batch gradient descent, the training dataset is divided into smaller batches that are then utilised to compute model error and update model coefficients.
The variance of the gradient can be further reduced by implementations by summing the gradient over the mini-batch.
There are different optimisers in deep learning but Mini-batch gradient descent aims to strike a compromise between batch gradient descent's efficiency and stochastic gradient descent's resilience. It is the deep learning application of gradient descent that is most often employed.
Upsides
Since the model is updated more often than with batch gradient descent, a more stable convergence is possible without local minima.
Compared to stochastic gradient descent, the batched updates offer a procedure that is more computationally efficient.
The efficiency of not having all of the training data in memory and method implementations are both made possible by batching.
Downsides
Mini-batch calls for the learning algorithm to be configured with an additional "mini-batch size" hyperparameter.
Like batch gradient descent, error information must be gathered across mini-batches of training instances.
Comments
Post a Comment