Constrained Optimizer

class cooper.constrained_optimizer.ConstrainedOptimizer(formulation, primal_optimizer, dual_optimizer=None, dual_scheduler=None, alternating=False, dual_restarts=False)[source]

Optimizes a ConstrainedMinimizationProblem given its Formulation.

A ConstrainedOptimizer includes one or two torch.optim.Optimizers, for the primal and dual variables associated with the Formulation, respectively.

A ConstrainedOptimizer can be used on constrained or unconstrained ConstrainedMinimizationProblems. Please refer to the documentation of the ConstrainedMinimizationProblem and Formulation classes for further details on handling unconstrained problems.

Parameters
  • formulation (Formulation) – Formulation of the ConstrainedMinimizationProblem to be optimized.

  • primal_optimizer (Optimizer) – Fully instantiated torch.optim.Optimizer used to optimize the primal parameters (e.g. model parameters).

  • dual_optimizer (Optional[Optimizer]) – Partially instantiated torch.optim.Optimizer used to optimize the dual variables (e.g. Lagrange multipliers). Defaults to None. When dealing with an unconstrained problem, should be set to None.

  • dual_scheduler (Optional[_LRScheduler]) – Partially instantiated torch.optim.lr_scheduler._LRScheduler used to schedule the learning rate of the dual variables. Defaults to None. When dealing with an unconstrained problem, should be set to None.

  • alternating (bool) – Whether to alternate parameter updates between primal and dual parameters. Otherwise, do simultaneous parameter updates. Defaults to False.

  • dual_restarts (bool) – If True, perform “restarts” on the Lagrange multipliers associated with inequality constraints: whenever the constraint is satisfied, directly set the multiplier to zero. Defaults to False.

sanity_checks()[source]

Perform sanity checks on the initialization of ConstrainedOptimizer.

Raises
  • NotImplementedError – The Formulation has an augmented Lagrangian coefficient and primal_optimizer has an extrapolation function. This is not supported because of possible unexpected behavior.

  • RuntimeError – The primal_optimizer has an extrapolation function and alternating was set to True. Mixing extrapolation and alternating updates is not supported.

  • RuntimeError – a dual_optimizer was provided but the ConstrainedMinimizationProblem of formulation was unconstrained. There are no dual variables to optimize.

  • RuntimeError – a dual_scheduler was provided but the ConstrainedMinimizationProblem of formulation was unconstrained. There are no dual variables and no dual_optimizer for learning rate scheduling.

  • RuntimeError – a dual_scheduler was provided but no dual_optimizer was provided. Can not schedule the learning rate of an unknown optimizer.

  • RuntimeError – the considered ConstrainedMinimizationProblem is unconstrained, but the provided primal_optimizer has an extrapolation function. This is not supported because of unexpected behavior when using extrapolation to update the primal parameters without any dual parameters.

  • RuntimeError – One of primal_optimizer or dual_optimizer has an extrapolation function while the other does not. Extrapolation on only one player is not supported.

step(closure=None, *closure_args, **closure_kwargs)[source]

Performs a single optimization step on both the primal and dual variables. If dual_scheduler is provided, a scheduler step is performed on the learning rate of the dual_optimizer.

Parameters
  • closure (Optional[Callable[..., CMPState]]) – Closure Callable required for re-evaluating the objective and constraints when performing alternating or extrapolating updates. Defaults to None.

  • *closure_args – Arguments to be passed to the closure function when re-evaluating.

  • **closure_kwargs – Keyword arguments to be passed to the closure function when re-evaluating.

zero_grad(ignore_primal=False, ignore_dual=False)[source]

Sets the gradients of all optimized Parameters to zero. This includes both the primal and dual variables.

Parameters
  • ignore_primal (bool) – If True, the gradients of the primal variables will not be zeroed. Defaults to False.

  • ignore_dual (bool) – If True, the gradients of the dual variables will not be zeroed. Defaults to False.

How to use a ConstrainedOptimizer

The ConstrainedOptimizer class is the cornerstone of Cooper. A ConstrainedOptimizer performs parameter updates to solve a ConstrainedMinimizationProblem given a chosen Formulation.

A ConstrainedOptimizer wraps a torch.optim.Optimizer used for updating the “primal” parameters associated directly with the optimization problem. These might be, for example, the parameters of the model you are training.

Additionally, a ConstrainedOptimizer includes a second torch.optim.Optimizer, which performs updates on the “dual” parameters (e.g. the multipliers used in a LagrangianFormulation).

Construction

The main ingredients to build a ConstrainedOptimizer are a Formulation (associated with a ConstrainedMinimizationProblem) and a torch.optim.Optimizer corresponding to the primal_optimizer.

If the ConstrainedMinimizationProblem you are dealing with is in fact constrained, depending on your formulation, you might also need to provide a dual_optimizer. Check out the section on Partial optimizer instantiation for more details on defining dual_optimizers.

Note

Cooper includes extra-gradient implementations of SGD and Adam which can be used as primal or dual optimizers. See Extra-gradient optimizers.

Examples

The highlighted lines below show the small changes required to go from an unconstrained to a constrained problem. Note that these changes should also be accompanied with edits to the custom problem class which inherits from ConstrainedMinimizationProblem. More details on the definition of a CMP can be found under the entry for Constrained Minimization Problem.

  • Unconstrained problem

     1model =  ModelClass(...)
     2cmp = cooper.ConstrainedMinimizationProblem(is_constrained=False)
     3formulation = cooper.problem.Formulation(...)
     4
     5primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
     6
     7constrained_optimizer = cooper.ConstrainedOptimizer(
     8    formulation=formulation,
     9    primal_optimizer=primal_optim,
    10)
    
  • Constrained problem

     1model =  ModelClass(...)
     2cmp = cooper.ConstrainedMinimizationProblem(is_constrained=True)
     3formulation = cooper.problem.Formulation(...)
     4
     5primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
     6# Note that dual_optimizer is "partly instantiated", *without* parameters
     7dual_optimizer = cooper.optim.partial_optimizer(torch.optim.SGD, lr=1e-3, momentum=0.9)
     8
     9constrained_optimizer = cooper.ConstrainedOptimizer(
    10    formulation=formulation,
    11    primal_optimizer=primal_optimizer,
    12    dual_optimizer=dual_optimizer,
    13)
    

The training loop

We have gathered all the ingredients we need for tackling our CMP: the custom ConstrainedMinimizationProblem class, along with your ConstrainedOptimizer of choice and a ConstrainedOptimizer for updating the parameters. Now it is time to put them to good use.

The typical training loop for solving a CMP in a machine learning set up using Cooper (with a Lagrangian Formulation) will involve the following steps:

Overview of main steps in a training loop

  1. (Optional) Iterate over your dataset and sample of mini-batch.

  2. Call constrained_optimizer.zero_grad() to reset the parameters’ gradients

  3. Compute the current CMPState (or estimate it with the minibatch) and calculate the Lagrangian using lagrangian.composite_objective(cmp.closure, ...).

  4. Populate the primal and dual gradients with formulation.custom_backward(lagrangian)

  5. Perform updates on the parameters using the primal and dual optimizers based on the recently computed gradients, via a call to constrained_optimizer.step().

Example

 1model = ModelClass(...)
 2cmp = cooper.ConstrainedMinimizationProblem(...)
 3formulation = cooper.LagrangianFormulation(...)
 4
 5primal_optimizer = torch.optim.SGD(model.parameters(), lr=primal_lr)
 6# Note that dual_optimizer is "partly instantiated", *without* parameters
 7dual_optimizer = cooper.optim.partial_optimizer(torch.optim.SGD, lr=primal_lr)
 8
 9constrained_optimizer = cooper.ConstrainedOptimizer(
10    formulation=formulation,
11    primal_optimizer=primal_optimizer,
12    dual_optimizer=dual_optimizer,
13)
14
15for inputs, targets in dataset:
16    # Clear gradient buffers
17    constrained_optimizer.zero_grad()
18
19    # The closure is required to compute the Lagrangian
20    # The closure might in turn require the model, inputs, targets, etc.
21    lagrangian = formulation.composite_objective(cmp.closure, ...)
22
23    # Populate the primal and dual gradients
24    formulation.custom_backward(lagrangian)
25
26    # Perform primal and dual parameter updates
27    constrained_optimizer.step()

Parameter updates

By default, parameter updates will be performed using simultaneous gradient descent-ascent updates (according to the choice of primal and dual optimizers). Formally,

\[\begin{split}x_{t+1} &= \texttt{primal_optimizer_update} \left( x_{t}, \nabla_{x} \mathcal{L}(x_t, \lambda_t) \right)\\ \lambda_{t+1} &= \texttt{dual_optimizer_update} \left( \lambda_{t}, {\color{red} \mathbf{-}} \nabla_{\lambda} \mathcal{L}(x_t, \lambda_t) \right)\end{split}\]

Note

We explicitly include a negative sign in front of the gradient for \(\lambda\) in order to highlight the fact that \(\lambda\) solves maximization problem. Cooper handles the sign flipping internally, so you should provide your definition for a dual_optimizer using a non-negative learning rate, as usual!

Multiplier projection

Lagrange multipliers associated with inequality constraints should remain non-negative. Cooper executes the standard projection to \(\mathbb{R}^{+}\) by default for DenseMultipliers. For more details on using custom projection operations, see the section on Multipliers.

Other update strategies implemented by ConstrainedOptimizer include:

The ConstrainedOptimizer implements a ConstrainedOptimizer.step() method, that updates the primal and dual parameters (if Formulation has any). The nature of the update depends on the attributes provided during the initialization of the ConstrainedOptimizer. By default, updates are via gradient descent on the primal parameters and (projected) ascent on the dual parameters, with simultaneous updates.

Note

When applied to an unconstrained problem, ConstrainedOptimizer.step() will be equivalent to performing primal_optimizer.step() based on the gradient of the loss with respect to the primal parameters.

Additional features

Alternating updates

It is possible to perform alternating updates between the primal and dual parameters by setting the flag alternating=True in the construction of the ConstrainedOptimizer. In this case, the gradient computed by calling custom_backward() is used to update the primal parameters. Then, the gradient with respect to the dual variables (given the new value of the primal parameter!) is computed and used to update the dual variables. This two-stage process is handled by Cooper inside the ConstrainedOptimizer.step() method.

\[\begin{split}x_{t+1} &= \texttt{primal_optimizer_update} \left( x_{t}, \nabla_{x} \mathcal{L}(x_t, \lambda_t) \right)\\ \lambda_{t+1} &= \texttt{dual_optimizer_update} \left( \lambda_{t}, {\color{red} \mathbf{-}} \nabla_{\lambda} \mathcal{L}({\color{red} x_{t+1}}, \lambda_t) \right)\end{split}\]

Important

Selecting alternating=True does not necessarily double the number of backward passes through a the primal parameters!

When using a LagrangianFormulation, to obtain the gradients with respect to the Lagrange multipliers, it suffices to evaluate the constraint defects (through a call to closure()). This operation does not require a having to back-propagate through the Lagrangian.

Todo

This functionality is untested and is yet to be integrated with the use of proxy constraints.

Dual restarts

dual_restarts=True can be used to set the value of the dual variables associated with inequality constraints to zero whenever the respective constraints are being satisfied. We do not perform dual_restarts on multipliers for equality constraints.

For simplicity, consider a CMP with two inequality constraints \(g_1(x) \le 0\) and \(g_2(x) \le 0\) and loss \(f(x)\). Suppose that the constraint on \(g_1\) is strictly satisfied, while that on \(g_2\) is violated at \(x\). Consider the Lagrangian at \(x\) with (non-negative) multipliers \(\lambda_1\) and \(\lambda_2\).

\[\mathcal{L}(x,\lambda_1, \lambda_2) = f(x) + \lambda_1 \, g_1(x) + \lambda_2 \, g_2(x)\]

The best response for \(\lambda_1\) and \(\lambda_2\) at the current value of the primal parameters \(x\) is \((\lambda_1, \lambda_2) = (0, +\infty)\). This is due to the non-negativity constraints on the Lagrange multipliers, the sign of the constraint violations, and the fact that the \(\lambda\)-player wants to maximize \(\mathcal{L}(x,\lambda_1, \lambda_2)\).

“Playing a best response” for the Lagrange multiplier \(\lambda_2\) of the violated constraint clearly leads to an impractical algorithm (due to numerical overflow). However, for the currently feasible constraint, performing a best response update on \(\lambda_1\) is indeed implementable, and trivially so. This is exactly the effect of setting dual_restarts=True!

In practice, this prevents the optimization from over-focusing on a constraint (at the expense on improving on the loss function!) when said constraint is already being satisfied. This over-penalization arises from the accumulation of the (positive) defects in the value of the Lagrange multiplier while the constraint was being violated in the past.

Note

We recommend setting dual_restarts=False when dealing with constraints whose violations are estimated stochastically, for example Monte Carlo estimates constraints described by expectations. This is to avoid restarting multipliers when a constraint is “being satisfied” for a single estimate, since this can be a result of the stochasticity of the estimator.

Augmented Lagrangian

Todo

Verify current (unfinished and untested) implementation of the Augmented Lagrangian method.