Optim

The cooper.optim module contains classes and functions for solving constrained minimization problems (CMPs).

This module is divided into two main parts:

Constrained Optimizers: for solving constrained minimization problems.
Unconstrained Optimizers: for solving unconstrained minimization problems.

The Torch Optimizers section describes Cooper implementations of torch.optim.Optimizer classes tailored for solving CMPs that are not available in PyTorch.

Quick Start

A ConstrainedOptimizer performs parameter updates to solve a ConstrainedMinimizationProblem. This class wraps two torch.optim.Optimizer objects: one for the primal parameters \(\vx\) and one for the dual parameters \(\vlambda\) and \(\vmu\). We refer to these as the primal and dual optimizers, respectively.

Constrained Optimizers in Cooper

Cooper implements the following subclasses of ConstrainedOptimizer:

SimultaneousOptimizer: Updates the primal and dual parameters simultaneously.
AlternatingPrimalDualOptimizer: Performs alternating updates, starting with the primal parameters followed by the dual parameters.
AlternatingDualPrimalOptimizer: Performs alternating updates, starting with the dual parameters followed by the primal parameters.
ExtrapolationConstrainedOptimizer: Performs extragradient updates [Kor76].

All ConstrainedOptimizers expect the following arguments:

cmp: a ConstrainedMinimizationProblem.
primal_optimizers: a torch.optim.Optimizer (or a list of optimizers) for the primal parameters.
dual_optimizers: a torch.optim.Optimizer (or a list of optimizers) for the dual parameters.

Multiple Primal or Dual Optimizers

When a list of optimizers is provided for the primal_optimizers or dual_optimizers argument, the different optimizers are treated as a single optimizer. As a result, all optimizers in the list are updated simultaneously, without intermediate calls to re-compute the Lagrangian or the CMP state.

Unconstrained problems in Cooper

To accommodate the solution of unconstrained problems using Cooper, we provide a UnconstrainedOptimizer class. This is useful for handling both unconstrained problems, as well as formulations of constrained problems without dual variables (e.g., the QuadraticPenalty formulation). This design allows the use of the roll() interface regardless of whether the problem is constrained or unconstrained.

The `roll()` Method

ConstrainedOptimizer objects define a roll() method that prescribes how and when to update the primal and dual parameters. This method is used to perform a single iteration of the optimization algorithm, following PyTorch’s zero_grad() -> forward() -> backward() -> step() approach.

The roll() method is responsible for:

Zeroing Gradients: Calling primal_optimizer.zero_grad() and dual_optimizer.zero_grad().
Forward Computations:
1. Computing the problem’s CMPState by calling cooper.ConstrainedMinimizationProblem.compute_cmp_state().
2. Calculating the primal and dual Lagrangians.
Backward Calling torch.Tensor.backward() on the Lagrangian terms.
Step:
1. Calling torch.optim.Optimizer.step() on the primal and dual optimizers.
2. Projecting the dual-variables associated with inequality constraints to the non-negative orthant by calling cooper.multipliers.Multiplier.post_step_().

As the procedures for performing updates on the parameters of a CMP can be complex, the roll() method provides a convenient and consistent interface for performing parameter updates across ConstrainedOptimizers. Therefore, when using a ConstrainedOptimizer, users are expected to call the roll() method, instead of the individual step() methods of the primal and dual optimizers.

abstract CooperOptimizer.roll(*args, **kwargs)[source]

Evaluates the objective function and performs a gradient update on the parameters.

Return type:: RollOut

The roll() method returns a RollOut object. This includes the computed loss, CMPState, and the primal and dual Lagrangians (packed into LagrangianStore objects). This information can be useful for logging and debugging purposes.

For example, to access the primal Lagrangian you can use the following code snippet:

roll_out = constrained_optimizer.roll(compute_cmp_state_kwargs={...})
primal_lagrangian = roll_out.primal_lagrangian_store.lagrangian

class cooper.optim.RollOut(loss: Tensor, cmp_state: CMPState, primal_lagrangian_store: LagrangianStore, dual_lagrangian_store: LagrangianStore)[source]

Stores the output of a call to roll().

Parameters:

loss (torch.Tensor) – Value of the objective function.
cmp_state (CMPState) – State of the CMP.
primal_lagrangian_store (LagrangianStore) – LagrangianStore for the primal Lagrangian.
dual_lagrangian_store (LagrangianStore) – LagrangianStore for the dual Lagrangian.

class cooper.LagrangianStore(lagrangian=None, multiplier_values=<factory>, penalty_coefficient_values=<factory>)[source]

Stores the value of the (primal or dual) Lagrangian, as well as the multiplier and penalty coefficient values for the observed constraints.

Parameters:

lagrangian (Optional[Tensor]) – Value of the Lagrangian.
multiplier_values (dict[Constraint, Tensor]) – Value of the multipliers associated with the observed constraints.
penalty_coefficient_values (dict[Constraint, Tensor]) – Value of the penalty coefficients associated with the observed constraints.

Example

To use a ConstrainedOptimizer with a Lagrangian formulation, follow these steps:

[Line 8]: Instantiate a primal_optimizer for the primal parameters.
[Line 12]: Instantiate a dual_optimizer for the dual parameters. Set maximize=True since the dual parameters maximize the Lagrangian.

Extracting the dual parameters

Similar to torch.nn.Module.parameters(), ConstrainedMinimizationProblem objects provide the helper method dual_parameters() for extracting the dual parameters for all of its registered constraints.
[Lines 16-20]: Instantiate a ConstrainedOptimizer, passing the cmp, primal_optimizer, and dual_optimizer as arguments.
[Line 26]: Use the roll() method to perform a single call to the step() method of both the primal and dual optimizers.

import torch
import cooper

train_loader = ... # PyTorch DataLoader
model = ... # PyTorch model
cmp = ... # containing `Constraint`s and their associated `Multiplier`s

primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# `cmp.dual_parameters()` returns the parameters associated with the multipliers.
# Must set `maximize=True` since the multipliers *maximize* the Lagrangian.
dual_optimizer = torch.optim.SGD(cmp.dual_parameters(), lr=1e-3, maximize=True)

# `ConstrainedOptimizer`s need access to the cmp to compute the loss, constraints, and
# Lagrangian. Some `ConstrainedOptimizer`s do these calculations multiple times.
constrained_optimizer = cooper.SimultaneousOptimizer(
    cmp=cmp,
    primal_optimizers=primal_optimizer,
    dual_optimizer=dual_optimizer,
)

for inputs, targets in train_loader:
    # kwargs used by `cmp.compute_cmp_state` method to compute the loss and constraints.
    kwargs = {"model": model, "inputs": inputs, "targets": targets}

    constrained_optimizer.roll(compute_cmp_state_kwargs={"model": model, "inputs": inputs, "targets": targets})

Constrained Optimizers

ConstrainedOptimizer objects are used to solve ConstrainedMinimizationProblems (CMPs) whose chosen formulation involves dual variables. This is achieved via gradient-based optimization of the primal and dual parameters.

Unconstrained formulations of constrained problems

For solving problems via formulations that do not require dual variables, such as the QuadraticPenalty formulation, use the UnconstrainedOptimizer class.

Projected \(\vlambda\) Updates

To ensure the non-negativity of Lagrange multipliers associated with inequality constraints, all ConstrainedOptimizers call the cooper.multipliers.Multiplier.post_step_() method after dual parameter updates, which projects inequality multipliers onto the non-negative orthant.

Base Class

class cooper.optim.ConstrainedOptimizer(cmp, primal_optimizers, dual_optimizers)[source]

Optimizes a ConstrainedMinimizationProblem.

A ConstrainedOptimizer includes one or more torch.optim.Optimizers for the primal variables. It also includes one or more torch.optim.Optimizers for the dual variables.

For handling unconstrained problems in a consistent way, we provide the UnconstrainedOptimizer class.

Parameters:

cmp (ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call the compute_cmp_state() method within the roll() method. Additionally, in the case of a constrained optimizer, the CMP enables access to the multipliers’ post_step_() method which must be called after the multiplier update.
primal_optimizers (Union[Optimizer, Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this case primal_optimizers accepts a list of torch.optim.Optimizers.
dual_optimizers (Union[Optimizer, Sequence[Optimizer]]) – Optimizer(s) for the dual variables (e.g. the Lagrange multipliers associated with the constraints). A sequence of torch.optim.Optimizers can be passed to handle the case of several Constraints. If dealing with an unconstrained problem, please use an UnconstrainedOptimizer instead.

dual_step()[source]

Performs a gradient step on the parameters associated with the dual variables. Since the dual problem involves maximizing over the dual variables, we require dual optimizers which satisfy maximize=True.

After being updated by the dual optimizer steps, the multipliers are post-processed (e.g. to ensure non-negativity for inequality constraints).

Return type:: None

abstract roll(*args, **kwargs)[source]

Performs a full update step on the primal and dual variables.

Return type:: RollOut

Simultaneous Optimizer

A simple approach to solving CMPs is to update the primal and dual parameters simultaneously. This is the approach taken by the SimultaneousOptimizer class [AHU58].

class cooper.optim.SimultaneousOptimizer(cmp, primal_optimizers, dual_optimizers)[source]

Optimizes a ConstrainedMinimizationProblem by performing simultaneous gradient updates to the primal and dual variables.

According to the choice of primal and dual optimizers, the updates are performed as follows:

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{t}}, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{t}}, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\end{aligned}\end{align} \]

For instance, when the primal/dual updates are gradient descent/ascent on a Lagrangian formulation, the updates are as follows:

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_t) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_t),\end{aligned}\end{align} \]

where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.

roll(compute_cmp_state_kwargs=None)[source]

Evaluates the CMPState and performs a simultaneous primal-dual update.

Parameters:: compute_cmp_state_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_cmp_state() method.
Return type:: RollOut

Alternating Optimizers

Alternating updates enjoy enhanced convergence guarantees for min-max optimization problems under certain assumptions [GBV+19, ZWLG22]. In the context of constrained optimization, these benefits can be achieved without additional computational costs relative to simultaneous updates (see AlternatingDualPrimalOptimizer). This motivates the implementation of the AlternatingPrimalDualOptimizer and AlternatingDualPrimalOptimizer classes.

class cooper.optim.AlternatingDualPrimalOptimizer(cmp, primal_optimizers, dual_optimizers)[source]

Optimizes a ConstrainedMinimizationProblem by performing alternating updates, starting with the dual variables.

According to the choice of primal and dual optimizers, updates are performed as follows:

\[ \begin{align}\begin{aligned}\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}(\vx_t, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}(\vx_t, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\\\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_{\color{red} t+1}, \vmu_{\color{red} t+1} )|_{\vx=\vx_t} \right)\end{aligned}\end{align} \]

For instance, when employing alternating projected gradient descent-ascent on a Lagrangian formulation, the updates are as follows:

\[ \begin{align}\begin{aligned}\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_t) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_t),\\\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_{\color{red} t+1}^\top \nabla_{\vx} \vg(\vx_t) + \vmu_{\color{red} t+1}^\top \nabla_{\vx} \vh(\vx_t) \right ],\end{aligned}\end{align} \]

where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.

Note

Both the primal and dual updates depend on the CMPState at the current primal iterate \(\vx_{t}\). Consequently, although the primal update uses the updated dual variables \(\vlambda_{\color{red} t+1}\) and \(\vmu_{\color{red} t+1}\), the CMPState does not need to be recomputed after the dual update. As a result, the computational cost of this optimizer matches that of the SimultaneousOptimizer.

roll(compute_cmp_state_kwargs=None)[source]

Performs a dual-primal alternating step where the dual variables are updated first.

Parameters:

compute_cmp_state_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_cmp_state() method

Returns:

A named tuple containing the following objects:

loss (Tensor):
The loss value computed during the roll, \(f(\vx_{t})\).
cmp_state (CMPState):
The CMP state at \(\vx_{t}\).
primal_lagrangian_store (LagrangianStore):
The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_{\color{red} t+1}\) and \(\vmu_{\color{red} t+1}\).
dual_lagrangian_store (LagrangianStore):
The dual Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).

Return type:

RollOut

class cooper.optim.AlternatingPrimalDualOptimizer(cmp, primal_optimizers, dual_optimizers)[source]

Optimizes a ConstrainedMinimizationProblem by performing alternating updates, starting with the primal variables.

According to the choice of primal and dual optimizers, updates are performed as follows:

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{\color{red} t+1}}, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{\color{red} t+1}}, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\end{aligned}\end{align} \]

For instance, when employing alternating projected gradient descent-ascent on a Lagrangian formulation, the updates are as follows:

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_{\color{red} t+1}) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_{\color{red} t+1}),\end{aligned}\end{align} \]

where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.

This optimizer computes constraint violations twice: at \(\vx_{t}\) for the initial primal update, and again at the updated primal point \(\vx_{t+1}\) to update the dual variables. The former are used to compute the primal Lagrangian \(\Lag_{\text{primal}}\) while the latter are used to compute the dual Lagrangian \(\Lag_{\text{dual}}\).

Reducing computational overhead in primal-dual alternating updates

To update the dual variables, only the constraint violations \(\vg(\vx_{\color{red} t+1})\) and \(\vh(\vx_{\color{red} t+1})\) are required, not the objective function value \(f(\vx_{\color{red} t+1})\). To reduce computational overhead, the user can implement the compute_violations() method of the CMP and pass the compute_violations_kwargs argument to roll(). This approach ensures that only the constraint violations are recomputed at \(\vx_{\color{red} t+1}\), without calculating the loss or constructing a computational graph over the primal variables.

roll(compute_cmp_state_kwargs=None, compute_violations_kwargs=None)[source]

Performs a primal-dual alternating step where the primal variables are updated first.

Parameters:

compute_cmp_state_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_cmp_state() method.
compute_violations_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_violations() method. When compute_violations() is implemented, it takes precedence over compute_cmp_state() for the dual update. If not implemented, the violations measured by compute_cmp_state() at the updated primal iterate are used.

Returns:

A named tuple containing the following objects:

loss (Tensor):
The most recent loss value at the end of the roll. If compute_violations() was used, returns \(f(\vx_{t})\). Otherwise, returns the recomputed loss at the updated primal point \(f(\vx_{t+1})\).
cmp_state (CMPState):
The CMP state at \(\vx_{\color{red} t+1}\). Note that if compute_violations() is used, the loss at \(\vx_{t+1}\) is not computed and cmp_state.loss will be None.
primal_lagrangian_store (LagrangianStore):
The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).
dual_lagrangian_store (LagrangianStore):
The dual Lagrangian store at \(\vx_{\color{red} t+1}\), \(\vlambda_t\) and \(\vmu_t\).

Return type:

RollOut

Extragradient

The extragradient method [Kor76] is a well-established approach for solving min-max optimization problems. It offers convergence for a broader class of problems compared to simultaneous or alternating gradient descent-ascent [GBV+19] and reduces oscillations in parameter updates.

However, a key drawback of the extragradient method is its computational cost as it requires two forward and backward passes per iteration and additional memory to store a copy of the optimization variables. In other words, each iteration is twice as expensive as a simultaneous gradient descent-ascent iteration.

This approach is implemented in the ExtrapolationConstrainedOptimizer class.

Extragradient-compatible optimizers

Not all torch.optim.Optimizers are compatible with the ExtrapolationConstrainedOptimizer. Primal and dual optimizers used with this class must implement both a step() method and an extrapolation() method. The extrapolation() method performs the extrapolation step of the algorithm.

To ensure compatibility, optimizers can inherit from ExtragradientOptimizer (see Extragradient Optimizers for details).

class cooper.optim.ExtrapolationConstrainedOptimizer(cmp, primal_optimizers, dual_optimizers)[source]

Optimizes a ConstrainedMinimizationProblem by performing extrapolation updates to the primal and dual variables.

Given the choice of primal and dual optimizers, an extrapolation step is performed first:

\[ \begin{align}\begin{aligned}\vx_{t+\frac{1}{2}} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+\frac{1}{2}} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{t}}, \vlambda, \vmu_t) |_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+\frac{1}{2}} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{t}}, \vlambda_{t}, \vmu) |_{\vmu=\vmu_t} \right).\end{aligned}\end{align} \]

This is followed by an update step, which modifies the primal and dual variables from step \(t\), based on the gradients computed at the extrapolated points \(t+\frac{1}{2}\):

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}} \left(\vx, \vlambda_{\color{red} t+\frac{1}{2}}, \vmu_{\color{red} t+\frac{1}{2}} \right)|_{\vx=\vx_{\color{red} t+\frac{1}{2}}} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}} \left({\vx_{\color{red} t+\frac{1}{2}}}, \vlambda, \vmu_{\color{red} t+\frac{1}{2}} \right) |_{\vlambda=\vlambda_{\color{red} t+\frac{1}{2}}}\right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}\left({\vx_{\color{red} t+\frac{1}{2}}}, \vlambda_{\color{red} t+\frac{1}{2}}, \vmu \right) |_{\vmu=\vmu_{\color{red} t+\frac{1}{2}}} \right).\end{aligned}\end{align} \]

For example, if the primal optimizer is gradient descent and the dual optimizer is gradient ascent, the extrapolation step leads to:

\[ \begin{align}\begin{aligned}\vx_{t+\frac{1}{2}} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+\frac{1}{2}} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_{t}) \right ]_+,\\\vmu_{t+\frac{1}{2}} &= \vmu_t + \eta_{\vmu} \vh(\vx_t).\end{aligned}\end{align} \]

The update step then yields:

\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f \left(\vx_{\color{red} t+\frac{1}{2}}\right) + \vlambda_{\color{red} t+\frac{1}{2}}^\top \nabla_{\vx} \vg \left(\vx_{\color{red} t+\frac{1}{2}} \right) + \vmu_{\color{red} t+\frac{1}{2}}^\top \nabla_{\vx} \vh\left(\vx_{\color{red} t+\frac{1}{2}} \right) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_{t+\frac{1}{2}} + \eta_{\vlambda} \vg(\vx_{\color{red} t+\frac{1}{2}}) \right ]_+,\\\vmu_{t+1} &= \vmu_{t+\frac{1}{2}} + \eta_{\vmu} \vh(\vx_{\color{red} t+\frac{1}{2}}).\end{aligned}\end{align} \]

The roll() will simultaneously call the extrapolation() and step() methods of the primal and dual optimizers.

custom_sanity_checks()[source]

Perform custom sanity checks on the initialization of the optimizer.

Raises:: RuntimeError – Tried to construct an ExtrapolationConstrainedOptimizer but some of the provided optimizers do not have an extrapolation method.
Return type:: None

primal_extrapolation_step()[source]

Perform an extrapolation step on the parameters associated with the primal variables.

Return type:: None

dual_extrapolation_step()[source]

Perform an extrapolation step on the parameters associated with the dual variables.

After being updated by the dual optimizer steps, the multipliers are post-processed (e.g. to ensure non-negativity for inequality constraints).

Return type:: None

roll(compute_cmp_state_kwargs=None)[source]

Performs a full update step on the primal and dual variables.

Note that the forward and backward computations are carried out twice, as part of the extrapolation() and step() calls.

Parameters:

compute_cmp_state_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_cmp_state() method.

Returns:

A named tuple containing the following objects:

loss (Tensor):
The loss value computed after the extrapolation step \(f(\vx_{t})\).
cmp_state (CMPState):
The CMP state at \(\vx_{t}\).
primal_lagrangian_store (LagrangianStore):
The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_{t}\) and \(\vmu_{t}\).
dual_lagrangian_store (LagrangianStore):
The dual Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).

Return type:

RollOut

Note

The RollOut for this scheme returns the loss and CMPState values at the original point \((\vx_t, \vlambda_t)\), before any of the updates are performed.

Unconstrained Optimizers

The UnconstrainedOptimizer class provides an interface based on the roll() method for parameter updates in unconstrained minimization problems. This class is implemented to maintain consistency with the ConstrainedOptimizer class.

The roll() method of the UnconstrainedOptimizer class performs the following steps:

Zeroing Gradients: Calls primal_optimizer.zero_grad().
Forward Computation:
1. Computes the problem’s CMPState by invoking compute_cmp_state().
2. Calculates the primal Lagrangian.
Backward Propagation: Calls backward() on the Lagrangian term.
Optimization Step: Invokes step() on the primal optimizer.

Example

To solve a ConstrainedMinimizationProblem using a QuadraticPenalty formulation, follow these steps:

import torch
import cooper

train_loader = ... # PyTorch DataLoader
model = ... # PyTorch model
cmp = ... # containing `Constraint`s and their associated `PenaltyCoefficient`s

primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

unconstrained_optimizer = cooper.UnconstrainedOptimizer(
    cmp=cmp,
    primal_optimizer=primal_optimizer,
)

for inputs, targets in train_loader:
    # kwargs used by `cmp.compute_cmp_state` method to compute the loss and constraints.
    kwargs = {"model": model, "inputs": inputs, "targets": targets}

    unconstrained_optimizer.roll(compute_cmp_state_kwargs=kwargs)

class cooper.optim.UnconstrainedOptimizer(cmp, primal_optimizers, dual_optimizers=None)[source]

Wraps a (sequence of) torch.optim.Optimizers to enable handling unconstrained minimization problems in a way that is consistent with ConstrainedOptimizers.

Parameters:

cmp (ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call the compute_cmp_state() method within the roll() method.
primal_optimizers (Union[Optimizer, Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this case primal_optimizers accepts a list of torch.optim.Optimizers.

roll(compute_cmp_state_kwargs=None)[source]

Evaluates the objective function and performs a gradient update on the parameters.

Parameters:: compute_cmp_state_kwargs (Optional[dict]) – Keyword arguments to pass to the compute_cmp_state() method. Since this is an unconstrained optimizer, the CMPState will just contain the loss.
Return type:: RollOut

Cooper Optimizer Base Class

CooperOptimizer is the base class for all Cooper optimizers, offering a unified interface for parameter updates. Both ConstrainedOptimizer and UnconstrainedOptimizer inherit from this class.

class cooper.optim.CooperOptimizer(cmp, primal_optimizers, dual_optimizers=None)[source]

Base class for ConstrainedOptimizer and UnconstrainedOptimizers.

Parameters:

cmp (ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call the compute_cmp_state() method within the roll() method. Additionally, in the case of a constrained optimizer, the CMP enables access to the multipliers’ post_step_() method which must be called after the multiplier update.
primal_optimizers (Union[Optimizer, Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this case primal_optimizers accepts a list of torch.optim.Optimizers.
dual_optimizers (Union[Optimizer, Sequence[Optimizer], None]) – Optimizer(s) for the dual variables (e.g. the Lagrange multipliers associated with the constraints). A sequence of torch.optim.Optimizers can be passed to handle the case of several Constraints.

zero_grad()[source]

Sets the gradients of all optimized Parameters to zero. This includes both the primal and dual variables.

Return type:: None

primal_step()[source]

Performs a gradient step on the parameters associated with the primal variables.

Return type:: None

state_dict()[source]

Returns the state of the optimizer as a CooperOptimizerState. This method relies on the internal state_dict() method of the corresponding primal or dual optimizers.

Return type:: CooperOptimizerState

load_state_dict(state)[source]

Loads the optimizer state from the given state dictionary.

Parameters:

state (CooperOptimizerState) – A dictionary containing the optimizer state.

Raises:

ValueError – If the number of primal optimizers does not match the number of primal optimizer states.
ValueError – If the number of dual optimizers does not match the number of dual optimizer states.
ValueError – If dual_optimizer_states is present in the state dict but dual_optimizers is None.

Return type:

None

Checkpointing

For convenience, if you checkpoint the state of a CooperOptimizer object, it automatically checkpoints the state of all associated primal and dual optimizers, packaged in a CooperOptimizerState object. For example, you can do the following:

# Save the state of the constrained optimizer
state_dict = constrained_optimizer.state_dict()
torch.save(state_dict, "checkpoint.pth")

# Load the state of the constrained optimizer
state_dict = torch.load("checkpoint.pth")
constrained_optimizer.load_state_dict(state_dict) # Automatically loads the state of the primal and dual optimizers

For a full working example, see this tutorial.

class cooper.optim.CooperOptimizerState[source]

Stores the state of a CooperOptimizer.

Parameters:

primal_optimizer_states – List of primal optimizer state_dicts.
dual_optimizer_states – List of dual optimizer state_dicts. If the optimizer is an unconstrained optimizer, this field is set to None.

Optim

Quick Start

The roll() Method

Example

Constrained Optimizers

Base Class

Simultaneous Optimizer

Alternating Optimizers

Extragradient

Unconstrained Optimizers

Example

Cooper Optimizer Base Class

Checkpointing

The `roll()` Method