Optimizers
Adadelta
-
optimizer eddl::adadelta(float lr, float rho, float epsilon, float weight_decay)
Adadelta optimizer.
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done.
- Parameters
lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
- Returns
Adadelta optimizer
Example:
opt = adadelta(0.001, 0.95, 0.000001, 0);
Adam
-
optimizer eddl::adam(float lr = 0.01, float beta_1 = 0.9, float beta_2 = 0.999, float epsilon = 0.000001, float weight_decay = 0, bool amsgrad = false)
Adam optimizer.
Default parameters follow those provided in the original paper (See section).
- Parameters
lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
amsgrad – Whether to apply the AMSGrad variant of this algorithm from the paper “On the Convergence of Adam and Beyond”.
- Returns
Adam optimizer
Example:
opt = adam(0.001);
Adagrad
-
optimizer eddl::adagrad(float lr, float epsilon, float weight_decay)
Adagrad optimizer.
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the learning rate.
- Parameters
lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
- Returns
Adagrad optimizer
Example:
opt = adagrad(0.001, 0.000001, 0);
Adamax
-
optimizer eddl::adamax(float lr, float beta_1, float beta_2, float epsilon, float weight_decay)
Adamax optimizer.
It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the See section.
- Parameters
lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
- Returns
Adamax optimizer
Example:
opt = adamax(0.001, 0.9, 0.999, 0.000001, 0);
Nadam
-
optimizer eddl::nadam(float lr, float beta_1, float beta_2, float epsilon, float schedule_decay)
Nadam optimizer.
It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the See section.
- Parameters
lr – Learning rate
beta_1 – Coefficients used for computing running averages of gradient and its square
beta_2 – Coefficients used for computing running averages of gradient and its square
epsilon – Term added to the denominator to improve numerical stability
schedule_decay – Weight decay (L2 penalty)
- Returns
Nadam optimizer
Example:
opt = nadam(0.001, 0.9, 0.999, 0.0000001, 0.004);
RMSProp
-
optimizer eddl::rmsprop(float lr = 0.01, float rho = 0.9, float epsilon = 0.00001, float weight_decay = 0.0)
RMSProp optimizer.
It is recommended to leave the parameters of this optimizer at their default values (except the learning rate, which can be freely tuned).
- Parameters
lr – Learning rate
rho – Smoothing constant
epsilon – Term added to the denominator to improve numerical stability
weight_decay – Weight decay (L2 penalty)
- Returns
RMSProp optimizer
Example:
opt = rmsprop(0.001);
SGD (Stochastic Gradient Descent)
-
optimizer eddl::sgd(float lr = 0.01f, float momentum = 0.0f, float weight_decay = 0.0f, bool nesterov = false)
Stochastic gradient descent optimizer.
Includes support for momentum, learning rate decay, and Nesterov momentum
- Parameters
lr – Learning rate
momentum – Momentum factor
weight_decay – Value to apply to the activation function
nesterov – Boolean. Whether to apply Nesterov momentum
- Returns
Stochastic gradient descent optimizer
Example:
opt = sgd(0.001);
Export to file
-
void save_optimizer_to_onnx_file(Optimizer *optimizer, string path)
Saves the configuration of an optimizer using the ONNX format. It will contain the Optimizer type and attributes like learning rate, momentum, weight decay, etc.
- Parameters
optimizer – Optimizer to be saved
path – Path to the file where the Optimizer configuration will be saved
- Returns
(void)
Example:
optimizer opt = sgd(0.001, 0.9);
save_optimizer_to_onnx_file(opt, "my_opt.onnx");
Import from file
-
Optimizer *import_optimizer_from_onnx_file(string path)
Creates an Optimizer from the definition provided in an ONNX file. The ONNX will provide the Optimizer type and its attributes.
- Parameters
path – Path to the file where the Optimizer configuration is saved
- Returns
Optimizer*
Example:
optimizer opt = import_optimizer_from_onnx_file("my_opt.onnx");