Updates¶
Updating functions that are passed to the network for optimization.
Updates functions
Arguments¶
- cost : cost function
- The cost function that will be minimised during training
- params : list of parameters
- The list of all the weights of the network that will be modified
sgd (cost, params[, learning_rate]) |
Stochastic Gradient Descent (SGD) updates |
momentum (cost, params[, learning_rate, momentum]) |
Stochastic Gradient Descent (SGD) updates with momentum |
nesterov_momentum (cost, params[, ...]) |
Stochastic Gradient Descent (SGD) updates with Nesterov momentum |
adagrad (cost, params[, learning_rate, epsilon]) |
Adaptive Gradient Descent |
rmsprop (cost, params[, learning_rate, rho, ...]) |
RMSProp updates |
adadelta (cost, params[, learning_rate, rho, ...]) |
Adadelta Gradient Descent |
adam (cost, params[, learning_rate, beta1, ...]) |
Adam Gradient Descent |
adamax (cost, params[, learning_rate, beta1, ...]) |
Adam Gradient Descent |
nadam (cost, params[, learning_rate, rho, ...]) |
Adam Gradient Descent with nesterov momentum |
Detailed description¶
-
yadll.updates.
sgd
(cost, params, learning_rate=0.1, **kwargs)[source]¶ Stochastic Gradient Descent (SGD) updates
param := param - learning_rate * gradient
-
yadll.updates.
momentum
(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]¶ Stochastic Gradient Descent (SGD) updates with momentum
velocity := momentum * velocity - learning_rate * gradient
param := param + velocity
-
yadll.updates.
nesterov_momentum
(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]¶ Stochastic Gradient Descent (SGD) updates with Nesterov momentum
velocity := momentum * velocity - learning_rate * gradient
param := param + momentum * velocity - learning_rate * gradient
References
[R101101] https://github.com/lisa-lab/pylearn2/pull/136#issuecomment-10381617
-
yadll.updates.
adagrad
(cost, params, learning_rate=0.1, epsilon=1e-06, **kwargs)[source]¶ Adaptive Gradient Descent Scale learning rates by dividing with the square root of accumulated squared gradients
References
[R103103] http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
-
yadll.updates.
rmsprop
(cost, params, learning_rate=0.01, rho=0.9, epsilon=1e-06, **kwargs)[source]¶ RMSProp updates Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients
-
yadll.updates.
adadelta
(cost, params, learning_rate=0.1, rho=0.95, epsilon=1e-06, **kwargs)[source]¶ Adadelta Gradient Descent Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes
References
[R105105] https://arxiv.org/pdf/1212.5701v1.pdf
-
yadll.updates.
adam
(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]¶ Adam Gradient Descent Scale learning rates by Adaptive moment estimation
References
[R107107] https://arxiv.org/pdf/1412.6980v8.pdf
-
yadll.updates.
adamax
(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]¶ Adam Gradient Descent Scale learning rates by adaptive moment estimation
References
[R109109] https://arxiv.org/pdf/1412.6980v8.pdf
-
yadll.updates.
nadam
(cost, params, learning_rate=1.0, rho=0.95, epsilon=1e-06, **kwargs)[source]¶ Adam Gradient Descent with nesterov momentum
References
[R111111] http://cs229.stanford.edu/proj2015/054_report.pdf