Optimizers

Adadelta

optimizer eddl::adadelta(float lr, float rho, float epsilon, float weight_decay)

Adadelta optimizer.

Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done.

See: https://arxiv.org/abs/1212.5701

Parameters

lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)

Returns

Adadelta optimizer

Example:

opt = adadelta(0.001, 0.95, 0.000001, 0);

Adam

optimizer eddl::adam(float lr = 0.01, float beta_1 = 0.9, float beta_2 = 0.999, float epsilon = 0.000001, float weight_decay = 0, bool amsgrad = false)

Adam optimizer.

Default parameters follow those provided in the original paper (See section).

See: https://arxiv.org/abs/1412.6980v8

Parameters

lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
amsgrad – Whether to apply the AMSGrad variant of this algorithm from the paper “On the Convergence of Adam and Beyond”.

Returns

Adam optimizer

Example:

opt = adam(0.001);

Adagrad

optimizer eddl::adagrad(float lr, float epsilon, float weight_decay)

Adagrad optimizer.

Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.

See: http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

Parameters

lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)

Returns

Adagrad optimizer

Example:

opt = adagrad(0.001, 0.000001, 0);

Adamax

optimizer eddl::adamax(float lr, float beta_1, float beta_2, float epsilon, float weight_decay)

Adamax optimizer.

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the See section.

See: https://arxiv.org/abs/1412.6980v8

Parameters

lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)

Returns

Adamax optimizer

Example:

opt = adamax(0.001, 0.9, 0.999, 0.000001, 0);

Nadam

optimizer eddl::nadam(float lr, float beta_1, float beta_2, float epsilon, float schedule_decay)

Nadam optimizer.

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the See section.

See: https://arxiv.org/abs/1412.6980v8

Parameters

lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
schedule_decay – Weight decay (L2 penalty)

Returns

Nadam optimizer

Example:

opt = nadam(0.001, 0.9, 0.999, 0.0000001, 0.004);

RMSProp

optimizer eddl::rmsprop(float lr = 0.01, float rho = 0.9, float epsilon = 0.00001, float weight_decay = 0.0)

RMSProp optimizer.

It is recommended to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned).

See: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

Parameters

lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)

Returns

RMSProp optimizer

Example:

opt = rmsprop(0.001);

SGD (Stochastic Gradient Descent)

optimizer eddl::sgd(float lr = 0.01f, float momentum = 0.0f, float weight_decay = 0.0f, bool nesterov = false)

Stochastic gradient descent optimizer.

Includes support for momentum, learning rate decay, and Nesterov momentum

Parameters

lr – Learning rate
momentum – Momentum factor
weight_decay – Value to apply to the activation function
nesterov – Boolean. Whether to apply Nesterov momentum

Returns

Stochastic gradient descent optimizer

Example:

opt = sgd(0.001);

Export to file

void save_optimizer_to_onnx_file(Optimizer *optimizer, string path)

Saves the configuration of an optimizer using the ONNX format. It will contain the Optimizer type and attributes like learning rate, momentum, weight decay, etc.

Parameters

optimizer – Optimizer to be saved
path – Path to the file where the Optimizer configuration will be saved

Returns

(void)

Example:

optimizer opt = sgd(0.001, 0.9);
save_optimizer_to_onnx_file(opt, "my_opt.onnx");

Import from file

Optimizer *import_optimizer_from_onnx_file(string path)

Creates an Optimizer from the definition provided in an ONNX file. The ONNX will provide the Optimizer type and its attributes.

Parameters: path – Path to the file where the Optimizer configuration is saved
Returns: Optimizer*

Example:

optimizer opt = import_optimizer_from_onnx_file("my_opt.onnx");