yadll.updates

Updates

Updating functions that are passed to the network for optimization.

Updates functions

Arguments

cost : cost function
The cost function that will be minimised during training
params : list of parameters
The list of all the weights of the network that will be modified
sgd(cost, params[, learning_rate]) Stochastic Gradient Descent (SGD) updates
momentum(cost, params[, learning_rate, momentum]) Stochastic Gradient Descent (SGD) updates with momentum
nesterov_momentum(cost, params[, ...]) Stochastic Gradient Descent (SGD) updates with Nesterov momentum
adagrad(cost, params[, learning_rate, epsilon]) Adaptive Gradient Descent
rmsprop(cost, params[, learning_rate, rho, ...]) RMSProp updates
adadelta(cost, params[, learning_rate, rho, ...]) Adadelta Gradient Descent
adam(cost, params[, learning_rate, beta1, ...]) Adam Gradient Descent
adamax(cost, params[, learning_rate, beta1, ...]) Adam Gradient Descent
nadam(cost, params[, learning_rate, rho, ...]) Adam Gradient Descent with nesterov momentum

Detailed description

yadll.updates.sgd(cost, params, learning_rate=0.1, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates

param := param - learning_rate * gradient

yadll.updates.momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates with momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + velocity

yadll.updates.nesterov_momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates with Nesterov momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + momentum * velocity - learning_rate * gradient

References

[R101101]https://github.com/lisa-lab/pylearn2/pull/136#issuecomment-10381617
yadll.updates.adagrad(cost, params, learning_rate=0.1, epsilon=1e-06, **kwargs)[source]

Adaptive Gradient Descent Scale learning rates by dividing with the square root of accumulated squared gradients

References

[R103103]http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
yadll.updates.rmsprop(cost, params, learning_rate=0.01, rho=0.9, epsilon=1e-06, **kwargs)[source]

RMSProp updates Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients

yadll.updates.adadelta(cost, params, learning_rate=0.1, rho=0.95, epsilon=1e-06, **kwargs)[source]

Adadelta Gradient Descent Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes

References

[R105105]https://arxiv.org/pdf/1212.5701v1.pdf
yadll.updates.adam(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent Scale learning rates by Adaptive moment estimation

References

[R107107]https://arxiv.org/pdf/1412.6980v8.pdf
yadll.updates.adamax(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent Scale learning rates by adaptive moment estimation

References

[R109109]https://arxiv.org/pdf/1412.6980v8.pdf
yadll.updates.nadam(cost, params, learning_rate=1.0, rho=0.95, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent with nesterov momentum

References

[R111111]http://cs229.stanford.edu/proj2015/054_report.pdf