Constrained Optimizer
- class cooper.constrained_optimizer.ConstrainedOptimizer(formulation, primal_optimizer, dual_optimizer=None, dual_scheduler=None, alternating=False, dual_restarts=False)[source]
Optimizes a
ConstrainedMinimizationProblem
given itsFormulation
.A
ConstrainedOptimizer
includes one or twotorch.optim.Optimizer
s, for the primal and dual variables associated with theFormulation
, respectively.A
ConstrainedOptimizer
can be used on constrained or unconstrainedConstrainedMinimizationProblem
s. Please refer to the documentation of theConstrainedMinimizationProblem
andFormulation
classes for further details on handling unconstrained problems.- Parameters
formulation (
Formulation
) –Formulation
of theConstrainedMinimizationProblem
to be optimized.primal_optimizer (
Optimizer
) – Fully instantiatedtorch.optim.Optimizer
used to optimize the primal parameters (e.g. model parameters).dual_optimizer (
Optional
[Optimizer
]) – Partially instantiatedtorch.optim.Optimizer
used to optimize the dual variables (e.g. Lagrange multipliers). Defaults to None. When dealing with an unconstrained problem, should be set to None.dual_scheduler (
Optional
[_LRScheduler
]) – Partially instantiatedtorch.optim.lr_scheduler._LRScheduler
used to schedule the learning rate of the dual variables. Defaults to None. When dealing with an unconstrained problem, should be set to None.alternating (
bool
) – Whether to alternate parameter updates between primal and dual parameters. Otherwise, do simultaneous parameter updates. Defaults to False.dual_restarts (
bool
) – If True, perform “restarts” on the Lagrange multipliers associated with inequality constraints: whenever the constraint is satisfied, directly set the multiplier to zero. Defaults to False.
- sanity_checks()[source]
Perform sanity checks on the initialization of
ConstrainedOptimizer
.- Raises
NotImplementedError – The
Formulation
has an augmented Lagrangian coefficient andprimal_optimizer
has anextrapolation
function. This is not supported because of possible unexpected behavior.RuntimeError – The
primal_optimizer
has anextrapolation
function andalternating
was set to True. Mixing extrapolation and alternating updates is not supported.RuntimeError – a
dual_optimizer
was provided but theConstrainedMinimizationProblem
of formulation was unconstrained. There are no dual variables to optimize.RuntimeError – a
dual_scheduler
was provided but theConstrainedMinimizationProblem
of formulation was unconstrained. There are no dual variables and nodual_optimizer
for learning rate scheduling.RuntimeError – a
dual_scheduler
was provided but nodual_optimizer
was provided. Can not schedule the learning rate of an unknown optimizer.RuntimeError – the considered
ConstrainedMinimizationProblem
is unconstrained, but the providedprimal_optimizer
has anextrapolation
function. This is not supported because of unexpected behavior when using extrapolation to update the primal parameters without any dual parameters.RuntimeError – One of
primal_optimizer
ordual_optimizer
has an extrapolation function while the other does not. Extrapolation on only one player is not supported.
- step(closure=None, *closure_args, **closure_kwargs)[source]
Performs a single optimization step on both the primal and dual variables. If
dual_scheduler
is provided, a scheduler step is performed on the learning rate of thedual_optimizer
.- Parameters
closure (
Optional
[Callable
[...
,CMPState
]]) – ClosureCallable
required for re-evaluating the objective and constraints when performing alternating or extrapolating updates. Defaults to None.*closure_args – Arguments to be passed to the closure function when re-evaluating.
**closure_kwargs – Keyword arguments to be passed to the closure function when re-evaluating.
How to use a ConstrainedOptimizer
The ConstrainedOptimizer
class is the cornerstone of Cooper. A
ConstrainedOptimizer
performs parameter updates to solve a
ConstrainedMinimizationProblem
given a chosen
Formulation
.
A ConstrainedOptimizer
wraps a torch.optim.Optimizer
used for updating the “primal” parameters associated directly with the
optimization problem. These might be, for example, the parameters of the model
you are training.
Additionally, a ConstrainedOptimizer
includes a second
torch.optim.Optimizer
, which performs updates on the “dual”
parameters (e.g. the multipliers used in a
LagrangianFormulation
).
Construction
The main ingredients to build a ConstrainedOptimizer
are a
Formulation
(associated with a
ConstrainedMinimizationProblem
) and a
torch.optim.Optimizer
corresponding to the primal_optimizer
.
If the ConstrainedMinimizationProblem
you are dealing with is in fact
constrained, depending on your formulation, you might also need to provide a
dual_optimizer
. Check out the section on Partial optimizer instantiation
for more details on defining dual_optimizer
s.
Note
Cooper includes extra-gradient implementations of SGD and Adam which can be used as primal or dual optimizers. See Extra-gradient optimizers.
Examples
The highlighted lines below show the small changes required to go from an
unconstrained to a constrained problem. Note that these changes should also be
accompanied with edits to the custom problem class which inherits from
ConstrainedMinimizationProblem
. More details on
the definition of a CMP can be found under the entry for Constrained Minimization Problem.
Unconstrained problem
1model = ModelClass(...) 2cmp = cooper.ConstrainedMinimizationProblem(is_constrained=False) 3formulation = cooper.problem.Formulation(...) 4 5primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-2) 6 7constrained_optimizer = cooper.ConstrainedOptimizer( 8 formulation=formulation, 9 primal_optimizer=primal_optim, 10)
Constrained problem
1model = ModelClass(...) 2cmp = cooper.ConstrainedMinimizationProblem(is_constrained=True) 3formulation = cooper.problem.Formulation(...) 4 5primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-2) 6# Note that dual_optimizer is "partly instantiated", *without* parameters 7dual_optimizer = cooper.optim.partial_optimizer(torch.optim.SGD, lr=1e-3, momentum=0.9) 8 9constrained_optimizer = cooper.ConstrainedOptimizer( 10 formulation=formulation, 11 primal_optimizer=primal_optimizer, 12 dual_optimizer=dual_optimizer, 13)
The training loop
We have gathered all the ingredients we need for tackling our CMP: the
custom ConstrainedMinimizationProblem
class, along
with your ConstrainedOptimizer
of choice and a
ConstrainedOptimizer
for updating the parameters. Now it is time to
put them to good use.
The typical training loop for solving a CMP in a machine learning set up using Cooper (with a Lagrangian Formulation) will involve the following steps:
Overview of main steps in a training loop
(Optional) Iterate over your dataset and sample of mini-batch.
Call
constrained_optimizer.zero_grad()
to reset the parameters’ gradientsCompute the current
CMPState
(or estimate it with the minibatch) and calculate the Lagrangian usinglagrangian.composite_objective(cmp.closure, ...)
.Populate the primal and dual gradients with
formulation.custom_backward(lagrangian)
Perform updates on the parameters using the primal and dual optimizers based on the recently computed gradients, via a call to
constrained_optimizer.step()
.
Example
1model = ModelClass(...) 2cmp = cooper.ConstrainedMinimizationProblem(...) 3formulation = cooper.LagrangianFormulation(...) 4 5primal_optimizer = torch.optim.SGD(model.parameters(), lr=primal_lr) 6# Note that dual_optimizer is "partly instantiated", *without* parameters 7dual_optimizer = cooper.optim.partial_optimizer(torch.optim.SGD, lr=primal_lr) 8 9constrained_optimizer = cooper.ConstrainedOptimizer( 10 formulation=formulation, 11 primal_optimizer=primal_optimizer, 12 dual_optimizer=dual_optimizer, 13) 14 15for inputs, targets in dataset: 16 # Clear gradient buffers 17 constrained_optimizer.zero_grad() 18 19 # The closure is required to compute the Lagrangian 20 # The closure might in turn require the model, inputs, targets, etc. 21 lagrangian = formulation.composite_objective(cmp.closure, ...) 22 23 # Populate the primal and dual gradients 24 formulation.custom_backward(lagrangian) 25 26 # Perform primal and dual parameter updates 27 constrained_optimizer.step()
Parameter updates
By default, parameter updates will be performed using simultaneous gradient descent-ascent updates (according to the choice of primal and dual optimizers). Formally,
Note
We explicitly include a negative sign in front of the gradient for
\(\lambda\) in order to highlight the fact that \(\lambda\) solves
maximization problem. Cooper handles the sign flipping internally, so
you should provide your definition for a dual_optimizer
using a non-negative
learning rate, as usual!
Multiplier projection
Lagrange multipliers associated with inequality constraints should remain
non-negative. Cooper executes the standard projection to
\(\mathbb{R}^{+}\) by default for
DenseMultiplier
s. For more details
on using custom projection operations, see the section on Multipliers.
Other update strategies implemented by ConstrainedOptimizer
include:
(TODO) Alternating updates for (projected) gradient descent-ascent
Performing Dual restarts on the Lagrange multipliers for inequality constraints
Using Extra-gradient
Extra-gradient-based optimizers require an extra call to the
cmp.closure()
. See the section on Extra-gradient optimizers for usage details.
TODO: Augmented Lagrangian
The ConstrainedOptimizer
implements a ConstrainedOptimizer.step()
method, that updates the primal and dual parameters (if Formulation
has any).
The nature of the update depends on the attributes provided during the
initialization of the ConstrainedOptimizer
. By default, updates are via
gradient descent on the primal parameters and (projected) ascent
on the dual parameters, with simultaneous updates.
Note
When applied to an unconstrained problem, ConstrainedOptimizer.step()
will be equivalent to performing primal_optimizer.step()
based on the
gradient of the loss with respect to the primal parameters.
Additional features
Alternating updates
It is possible to perform alternating updates between the primal and dual
parameters by setting the flag alternating=True
in the construction of the
ConstrainedOptimizer
. In this case, the gradient computed by calling
custom_backward()
is used to update the
primal parameters. Then, the gradient with respect to the dual variables (given
the new value of the primal parameter!) is computed and used to update the dual
variables. This two-stage process is handled by Cooper inside the
ConstrainedOptimizer.step()
method.
Important
Selecting alternating=True
does not necessarily double the number of
backward passes through a the primal parameters!
When using a LagrangianFormulation
, to obtain the gradients with respect
to the Lagrange multipliers, it suffices to evaluate the constraint
defects (through a call to
closure()
). This
operation does not require a having to back-propagate through the Lagrangian.
Todo
This functionality is untested and is yet to be integrated with the use of proxy constraints.
Dual restarts
dual_restarts=True
can be used to set the value of the dual variables
associated with inequality constraints to zero whenever the respective
constraints are being satisfied. We do not perform dual_restarts
on
multipliers for equality constraints.
For simplicity, consider a CMP with two inequality constraints \(g_1(x) \le 0\) and \(g_2(x) \le 0\) and loss \(f(x)\). Suppose that the constraint on \(g_1\) is strictly satisfied, while that on \(g_2\) is violated at \(x\). Consider the Lagrangian at \(x\) with (non-negative) multipliers \(\lambda_1\) and \(\lambda_2\).
The best response for \(\lambda_1\) and \(\lambda_2\) at the current value of the primal parameters \(x\) is \((\lambda_1, \lambda_2) = (0, +\infty)\). This is due to the non-negativity constraints on the Lagrange multipliers, the sign of the constraint violations, and the fact that the \(\lambda\)-player wants to maximize \(\mathcal{L}(x,\lambda_1, \lambda_2)\).
“Playing a best response” for the Lagrange multiplier \(\lambda_2\) of the
violated constraint clearly leads to an impractical algorithm (due to numerical
overflow). However, for the currently feasible constraint, performing a best
response update on \(\lambda_1\) is indeed implementable, and trivially so.
This is exactly the effect of setting dual_restarts=True
!
In practice, this prevents the optimization from over-focusing on a constraint (at the expense on improving on the loss function!) when said constraint is already being satisfied. This over-penalization arises from the accumulation of the (positive) defects in the value of the Lagrange multiplier while the constraint was being violated in the past.
Note
We recommend setting dual_restarts=False
when dealing with constraints
whose violations are estimated stochastically, for example Monte Carlo
estimates constraints described by expectations. This is to avoid
restarting multipliers when a constraint is “being satisfied” for a single
estimate, since this can be a result of the stochasticity of the estimator.
Augmented Lagrangian
Todo
Verify current (unfinished and untested) implementation of the Augmented Lagrangian method.