

Updating functions that are passed to the network for optimization.

Updates functions


cost : cost function
The cost function that will be minimised during training
params : list of parameters
The list of all the weights of the network that will be modified
sgd(cost, params[, learning_rate]) Stochastic Gradient Descent (SGD) updates
momentum(cost, params[, learning_rate, momentum]) Stochastic Gradient Descent (SGD) updates with momentum
nesterov_momentum(cost, params[, ...]) Stochastic Gradient Descent (SGD) updates with Nesterov momentum
adagrad(cost, params[, learning_rate, epsilon]) Adaptive Gradient Descent
rmsprop(cost, params[, learning_rate, rho, ...]) RMSProp updates
adadelta(cost, params[, learning_rate, rho, ...]) Adadelta Gradient Descent
adam(cost, params[, learning_rate, beta1, ...]) Adam Gradient Descent
adamax(cost, params[, learning_rate, beta1, ...]) Adam Gradient Descent
nadam(cost, params[, learning_rate, rho, ...]) Adam Gradient Descent with nesterov momentum

Detailed description

yadll.updates.sgd(cost, params, learning_rate=0.1, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates

param := param - learning_rate * gradient

yadll.updates.momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates with momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + velocity

yadll.updates.nesterov_momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]

Stochastic Gradient Descent (SGD) updates with Nesterov momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + momentum * velocity - learning_rate * gradient


yadll.updates.adagrad(cost, params, learning_rate=0.1, epsilon=1e-06, **kwargs)[source]

Adaptive Gradient Descent Scale learning rates by dividing with the square root of accumulated squared gradients


yadll.updates.rmsprop(cost, params, learning_rate=0.01, rho=0.9, epsilon=1e-06, **kwargs)[source]

RMSProp updates Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients

yadll.updates.adadelta(cost, params, learning_rate=0.1, rho=0.95, epsilon=1e-06, **kwargs)[source]

Adadelta Gradient Descent Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes


yadll.updates.adam(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent Scale learning rates by Adaptive moment estimation


yadll.updates.adamax(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent Scale learning rates by adaptive moment estimation


yadll.updates.nadam(cost, params, learning_rate=1.0, rho=0.95, epsilon=1e-06, **kwargs)[source]

Adam Gradient Descent with nesterov momentum

