Updates¶

Updating functions that are passed to the network for optimization.

Updates functions

Arguments¶

cost : cost function: The cost function that will be minimised during training
params : list of parameters: The list of all the weights of the network that will be modified

`sgd`(cost, params[, learning_rate])	Stochastic Gradient Descent (SGD) updates
`momentum`(cost, params[, learning_rate, momentum])	Stochastic Gradient Descent (SGD) updates with momentum
`nesterov_momentum`(cost, params[, ...])	Stochastic Gradient Descent (SGD) updates with Nesterov momentum
`adagrad`(cost, params[, learning_rate, epsilon])	Adaptive Gradient Descent
`rmsprop`(cost, params[, learning_rate, rho, ...])	RMSProp updates
`adadelta`(cost, params[, learning_rate, rho, ...])	Adadelta Gradient Descent
`adam`(cost, params[, learning_rate, beta1, ...])	Adam Gradient Descent
`adamax`(cost, params[, learning_rate, beta1, ...])	Adam Gradient Descent
`nadam`(cost, params[, learning_rate, rho, ...])	Adam Gradient Descent with nesterov momentum

yadll.updates.sgd(cost, params, learning_rate=0.1, **kwargs)[source]¶

Stochastic Gradient Descent (SGD) updates

param := param - learning_rate * gradient

yadll.updates.momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]¶

Stochastic Gradient Descent (SGD) updates with momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + velocity

yadll.updates.nesterov_momentum(cost, params, learning_rate=0.1, momentum=0.9, **kwargs)[source]¶

Stochastic Gradient Descent (SGD) updates with Nesterov momentum

velocity := momentum * velocity - learning_rate * gradient

param := param + momentum * velocity - learning_rate * gradient

References

yadll.updates.adagrad(cost, params, learning_rate=0.1, epsilon=1e-06, **kwargs)[source]¶

Adaptive Gradient Descent Scale learning rates by dividing with the square root of accumulated squared gradients

References

yadll.updates.rmsprop(cost, params, learning_rate=0.01, rho=0.9, epsilon=1e-06, **kwargs)[source]¶: RMSProp updates Scale learning rates by dividing with the moving average of the root mean squared (RMS) gradients

yadll.updates.adadelta(cost, params, learning_rate=0.1, rho=0.95, epsilon=1e-06, **kwargs)[source]¶

Adadelta Gradient Descent Scale learning rates by a the ratio of accumulated gradients to accumulated step sizes

References

yadll.updates.adam(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]¶

Adam Gradient Descent Scale learning rates by Adaptive moment estimation

References

yadll.updates.adamax(cost, params, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-06, **kwargs)[source]¶

Adam Gradient Descent Scale learning rates by adaptive moment estimation

References

yadll.updates.nadam(cost, params, learning_rate=1.0, rho=0.95, epsilon=1e-06, **kwargs)[source]¶

Adam Gradient Descent with nesterov momentum

References