Optim
The cooper.optim module contains classes and functions for solving constrained minimization problems (CMPs).
This module is divided into two main parts:
Constrained Optimizers: for solving constrained minimization problems.
Unconstrained Optimizers: for solving unconstrained minimization problems.
The Torch Optimizers section describes Cooper implementations of torch.optim.Optimizer classes tailored for solving CMPs that are not available in PyTorch.
Quick Start
A ConstrainedOptimizer performs parameter updates to solve a ConstrainedMinimizationProblem. This class wraps two torch.optim.Optimizer objects: one for the primal parameters \(\vx\) and one for the dual parameters \(\vlambda\) and \(\vmu\). We refer to these as the primal and dual optimizers, respectively.
Constrained Optimizers in Cooper
Cooper implements the following subclasses of ConstrainedOptimizer:
SimultaneousOptimizer: Updates the primal and dual parameters simultaneously.AlternatingPrimalDualOptimizer: Performs alternating updates, starting with the primal parameters followed by the dual parameters.AlternatingDualPrimalOptimizer: Performs alternating updates, starting with the dual parameters followed by the primal parameters.ExtrapolationConstrainedOptimizer: Performs extragradient updates [Kor76].
All ConstrainedOptimizers expect the following arguments:
cmp: aConstrainedMinimizationProblem.primal_optimizers: atorch.optim.Optimizer(or a list of optimizers) for the primal parameters.dual_optimizers: atorch.optim.Optimizer(or a list of optimizers) for the dual parameters.
Multiple Primal or Dual Optimizers
When a list of optimizers is provided for the primal_optimizers or dual_optimizers argument, the different optimizers are treated as a single optimizer. As a result, all optimizers in the list are updated simultaneously, without intermediate calls to re-compute the Lagrangian or the CMP state.
Unconstrained problems in Cooper
To accommodate the solution of unconstrained problems using Cooper, we provide a UnconstrainedOptimizer class. This is useful for handling both unconstrained problems, as well as formulations of constrained problems without dual variables (e.g., the QuadraticPenalty formulation). This design allows the use of the roll() interface regardless of whether the problem is constrained or unconstrained.
The roll() Method
ConstrainedOptimizer objects define a roll() method that prescribes how and when to update the primal and dual parameters. This method is used to perform a single iteration of the optimization algorithm, following PyTorch’s zero_grad() -> forward() -> backward() -> step() approach.
The roll() method is responsible for:
Zeroing Gradients: Calling
primal_optimizer.zero_grad()anddual_optimizer.zero_grad().Forward Computations:
Computing the problem’s
CMPStateby callingcooper.ConstrainedMinimizationProblem.compute_cmp_state().Calculating the primal and dual Lagrangians.
Backward Calling
torch.Tensor.backward()on the Lagrangian terms.Step:
Calling
torch.optim.Optimizer.step()on the primal and dual optimizers.Projecting the dual-variables associated with inequality constraints to the non-negative orthant by calling
cooper.multipliers.Multiplier.post_step_().
As the procedures for performing updates on the parameters of a CMP can be complex, the roll() method provides a convenient and consistent interface for performing parameter updates across ConstrainedOptimizers. Therefore, when using a ConstrainedOptimizer, users are expected to call the roll() method, instead of the individual step() methods of the primal and dual optimizers.
- abstract CooperOptimizer.roll(*args, **kwargs)[source]
Evaluates the objective function and performs a gradient update on the parameters.
- Return type:
The roll() method returns a RollOut object. This includes the computed loss, CMPState, and the primal and dual Lagrangians (packed into LagrangianStore objects). This information can be useful for logging and debugging purposes.
For example, to access the primal Lagrangian you can use the following code snippet:
roll_out = constrained_optimizer.roll(compute_cmp_state_kwargs={...})
primal_lagrangian = roll_out.primal_lagrangian_store.lagrangian
- class cooper.optim.RollOut(loss: Tensor, cmp_state: CMPState, primal_lagrangian_store: LagrangianStore, dual_lagrangian_store: LagrangianStore)[source]
Stores the output of a call to
roll().- Parameters:
loss (
torch.Tensor) – Value of the objective function.cmp_state (
CMPState) – State of the CMP.primal_lagrangian_store (
LagrangianStore) – LagrangianStore for the primal Lagrangian.dual_lagrangian_store (
LagrangianStore) – LagrangianStore for the dual Lagrangian.
- class cooper.LagrangianStore(lagrangian=None, multiplier_values=<factory>, penalty_coefficient_values=<factory>)[source]
Stores the value of the (primal or dual) Lagrangian, as well as the multiplier and penalty coefficient values for the observed constraints.
- Parameters:
multiplier_values (
dict[Constraint,Tensor]) – Value of the multipliers associated with the observed constraints.penalty_coefficient_values (
dict[Constraint,Tensor]) – Value of the penalty coefficients associated with the observed constraints.
Example
To use a ConstrainedOptimizer with a Lagrangian formulation, follow these steps:
[Line 8]: Instantiate a
primal_optimizerfor the primal parameters.[Line 12]: Instantiate a
dual_optimizerfor the dual parameters. Setmaximize=Truesince the dual parameters maximize the Lagrangian.Extracting the dual parameters
Similar to
torch.nn.Module.parameters(),ConstrainedMinimizationProblemobjects provide the helper methoddual_parameters()for extracting the dual parameters for all of its registered constraints.[Lines 16-20]: Instantiate a
ConstrainedOptimizer, passing thecmp,primal_optimizer, anddual_optimizeras arguments.[Line 26]: Use the
roll()method to perform a single call to thestep()method of both the primal and dual optimizers.
1import torch
2import cooper
3
4train_loader = ... # PyTorch DataLoader
5model = ... # PyTorch model
6cmp = ... # containing `Constraint`s and their associated `Multiplier`s
7
8primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
9
10# `cmp.dual_parameters()` returns the parameters associated with the multipliers.
11# Must set `maximize=True` since the multipliers *maximize* the Lagrangian.
12dual_optimizer = torch.optim.SGD(cmp.dual_parameters(), lr=1e-3, maximize=True)
13
14# `ConstrainedOptimizer`s need access to the cmp to compute the loss, constraints, and
15# Lagrangian. Some `ConstrainedOptimizer`s do these calculations multiple times.
16constrained_optimizer = cooper.SimultaneousOptimizer(
17 cmp=cmp,
18 primal_optimizers=primal_optimizer,
19 dual_optimizer=dual_optimizer,
20)
21
22for inputs, targets in train_loader:
23 # kwargs used by `cmp.compute_cmp_state` method to compute the loss and constraints.
24 kwargs = {"model": model, "inputs": inputs, "targets": targets}
25
26 constrained_optimizer.roll(compute_cmp_state_kwargs={"model": model, "inputs": inputs, "targets": targets})
Constrained Optimizers
ConstrainedOptimizer objects are used to solve ConstrainedMinimizationProblems (CMPs) whose chosen formulation involves dual variables. This is achieved via gradient-based optimization of the primal and dual parameters.
Unconstrained formulations of constrained problems
For solving problems via formulations that do not require dual variables, such as the QuadraticPenalty formulation, use the UnconstrainedOptimizer class.
Projected \(\vlambda\) Updates
To ensure the non-negativity of Lagrange multipliers associated with inequality constraints, all ConstrainedOptimizers call the cooper.multipliers.Multiplier.post_step_() method after dual parameter updates, which projects inequality multipliers onto the non-negative orthant.
Base Class
- class cooper.optim.ConstrainedOptimizer(cmp, primal_optimizers, dual_optimizers)[source]
Optimizes a
ConstrainedMinimizationProblem.A
ConstrainedOptimizerincludes one or moretorch.optim.Optimizers for the primal variables. It also includes one or moretorch.optim.Optimizers for the dual variables.For handling unconstrained problems in a consistent way, we provide the
UnconstrainedOptimizerclass.- Parameters:
cmp (
ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call thecompute_cmp_state()method within theroll()method. Additionally, in the case of a constrained optimizer, the CMP enables access to the multipliers’post_step_()method which must be called after the multiplier update.primal_optimizers (
Union[Optimizer,Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this caseprimal_optimizersaccepts a list oftorch.optim.Optimizers.dual_optimizers (
Union[Optimizer,Sequence[Optimizer]]) – Optimizer(s) for the dual variables (e.g. the Lagrange multipliers associated with the constraints). A sequence oftorch.optim.Optimizers can be passed to handle the case of severalConstraints. If dealing with an unconstrained problem, please use anUnconstrainedOptimizerinstead.
- dual_step()[source]
Performs a gradient step on the parameters associated with the dual variables. Since the dual problem involves maximizing over the dual variables, we require dual optimizers which satisfy
maximize=True.After being updated by the dual optimizer steps, the multipliers are post-processed (e.g. to ensure non-negativity for inequality constraints).
- Return type:
Simultaneous Optimizer
A simple approach to solving CMPs is to update the primal and dual parameters simultaneously. This is the approach taken by the SimultaneousOptimizer class [AHU58].
- class cooper.optim.SimultaneousOptimizer(cmp, primal_optimizers, dual_optimizers)[source]
Optimizes a
ConstrainedMinimizationProblemby performing simultaneous gradient updates to the primal and dual variables.According to the choice of primal and dual optimizers, the updates are performed as follows:
\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{t}}, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{t}}, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\end{aligned}\end{align} \]For instance, when the primal/dual updates are gradient descent/ascent on a
Lagrangianformulation, the updates are as follows:\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_t) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_t),\end{aligned}\end{align} \]where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.
Alternating Optimizers
Alternating updates enjoy enhanced convergence guarantees for min-max optimization problems under certain assumptions [GBV+19, ZWLG22]. In the context of constrained optimization, these benefits can be achieved without additional computational costs relative to simultaneous updates (see AlternatingDualPrimalOptimizer). This motivates the implementation of the AlternatingPrimalDualOptimizer and AlternatingDualPrimalOptimizer classes.
- class cooper.optim.AlternatingDualPrimalOptimizer(cmp, primal_optimizers, dual_optimizers)[source]
Optimizes a
ConstrainedMinimizationProblemby performing alternating updates, starting with the dual variables.According to the choice of primal and dual optimizers, updates are performed as follows:
\[ \begin{align}\begin{aligned}\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}(\vx_t, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}(\vx_t, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\\\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_{\color{red} t+1}, \vmu_{\color{red} t+1} )|_{\vx=\vx_t} \right)\end{aligned}\end{align} \]For instance, when employing alternating projected gradient descent-ascent on a
Lagrangianformulation, the updates are as follows:\[ \begin{align}\begin{aligned}\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_t) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_t),\\\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_{\color{red} t+1}^\top \nabla_{\vx} \vg(\vx_t) + \vmu_{\color{red} t+1}^\top \nabla_{\vx} \vh(\vx_t) \right ],\end{aligned}\end{align} \]where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.
Note
Both the primal and dual updates depend on the
CMPStateat the current primal iterate \(\vx_{t}\). Consequently, although the primal update uses the updated dual variables \(\vlambda_{\color{red} t+1}\) and \(\vmu_{\color{red} t+1}\), theCMPStatedoes not need to be recomputed after the dual update. As a result, the computational cost of this optimizer matches that of theSimultaneousOptimizer.- roll(compute_cmp_state_kwargs=None)[source]
Performs a dual-primal alternating step where the dual variables are updated first.
- Parameters:
compute_cmp_state_kwargs (
Optional[dict]) – Keyword arguments to pass to thecompute_cmp_state()method- Returns:
A named tuple containing the following objects:
- loss (
Tensor): The loss value computed during the roll, \(f(\vx_{t})\).
- loss (
- cmp_state (
CMPState): The CMP state at \(\vx_{t}\).
- cmp_state (
- primal_lagrangian_store (
LagrangianStore): The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_{\color{red} t+1}\) and \(\vmu_{\color{red} t+1}\).
- primal_lagrangian_store (
- dual_lagrangian_store (
LagrangianStore): The dual Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).
- dual_lagrangian_store (
- Return type:
- class cooper.optim.AlternatingPrimalDualOptimizer(cmp, primal_optimizers, dual_optimizers)[source]
Optimizes a
ConstrainedMinimizationProblemby performing alternating updates, starting with the primal variables.According to the choice of primal and dual optimizers, updates are performed as follows:
\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{\color{red} t+1}}, \vlambda, \vmu_t)|_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{\color{red} t+1}}, \vlambda_t, \vmu)|_{\vmu=\vmu_t} \right)\end{aligned}\end{align} \]For instance, when employing alternating projected gradient descent-ascent on a
Lagrangianformulation, the updates are as follows:\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_{\color{red} t+1}) \right ]_+,\\\vmu_{t+1} &= \vmu_t + \eta_{\vmu} \vh(\vx_{\color{red} t+1}),\end{aligned}\end{align} \]where \(\eta_{\vx}\), \(\eta_{\vlambda}\), and \(\eta_{\vmu}\) are step sizes.
This optimizer computes constraint violations twice: at \(\vx_{t}\) for the initial primal update, and again at the updated primal point \(\vx_{t+1}\) to update the dual variables. The former are used to compute the primal Lagrangian \(\Lag_{\text{primal}}\) while the latter are used to compute the dual Lagrangian \(\Lag_{\text{dual}}\).
Reducing computational overhead in primal-dual alternating updates
To update the dual variables, only the constraint violations \(\vg(\vx_{\color{red} t+1})\) and \(\vh(\vx_{\color{red} t+1})\) are required, not the objective function value \(f(\vx_{\color{red} t+1})\). To reduce computational overhead, the user can implement the
compute_violations()method of the CMP and pass thecompute_violations_kwargsargument toroll(). This approach ensures that only the constraint violations are recomputed at \(\vx_{\color{red} t+1}\), without calculating the loss or constructing a computational graph over the primal variables.- roll(compute_cmp_state_kwargs=None, compute_violations_kwargs=None)[source]
Performs a primal-dual alternating step where the primal variables are updated first.
- Parameters:
compute_cmp_state_kwargs (
Optional[dict]) – Keyword arguments to pass to thecompute_cmp_state()method.compute_violations_kwargs (
Optional[dict]) – Keyword arguments to pass to thecompute_violations()method. Whencompute_violations()is implemented, it takes precedence overcompute_cmp_state()for the dual update. If not implemented, the violations measured bycompute_cmp_state()at the updated primal iterate are used.
- Returns:
A named tuple containing the following objects:
- loss (
Tensor): The most recent loss value at the end of the roll. If
compute_violations()was used, returns \(f(\vx_{t})\). Otherwise, returns the recomputed loss at the updated primal point \(f(\vx_{t+1})\).
- loss (
- cmp_state (
CMPState): The CMP state at \(\vx_{\color{red} t+1}\). Note that if
compute_violations()is used, the loss at \(\vx_{t+1}\) is not computed andcmp_state.losswill beNone.
- cmp_state (
- primal_lagrangian_store (
LagrangianStore): The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).
- primal_lagrangian_store (
- dual_lagrangian_store (
LagrangianStore): The dual Lagrangian store at \(\vx_{\color{red} t+1}\), \(\vlambda_t\) and \(\vmu_t\).
- dual_lagrangian_store (
- Return type:
Extragradient
The extragradient method [Kor76] is a well-established approach for solving min-max optimization problems. It offers convergence for a broader class of problems compared to simultaneous or alternating gradient descent-ascent [GBV+19] and reduces oscillations in parameter updates.
However, a key drawback of the extragradient method is its computational cost as it requires two forward and backward passes per iteration and additional memory to store a copy of the optimization variables. In other words, each iteration is twice as expensive as a simultaneous gradient descent-ascent iteration.
This approach is implemented in the ExtrapolationConstrainedOptimizer class.
Extragradient-compatible optimizers
Not all torch.optim.Optimizers are compatible with the ExtrapolationConstrainedOptimizer. Primal and dual optimizers used with this class must implement both a step() method and an extrapolation() method. The extrapolation() method performs the extrapolation step of the algorithm.
To ensure compatibility, optimizers can inherit from ExtragradientOptimizer (see Extragradient Optimizers for details).
- class cooper.optim.ExtrapolationConstrainedOptimizer(cmp, primal_optimizers, dual_optimizers)[source]
Optimizes a
ConstrainedMinimizationProblemby performing extrapolation updates to the primal and dual variables.Given the choice of primal and dual optimizers, an extrapolation step is performed first:
\[ \begin{align}\begin{aligned}\vx_{t+\frac{1}{2}} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}}(\vx, \vlambda_t, \vmu_t)|_{\vx=\vx_t} \right)\\\vlambda_{t+\frac{1}{2}} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}}({\vx_{t}}, \vlambda, \vmu_t) |_{\vlambda=\vlambda_t} \right) \right]_+\\\vmu_{t+\frac{1}{2}} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}({\vx_{t}}, \vlambda_{t}, \vmu) |_{\vmu=\vmu_t} \right).\end{aligned}\end{align} \]This is followed by an update step, which modifies the primal and dual variables from step \(t\), based on the gradients computed at the extrapolated points \(t+\frac{1}{2}\):
\[ \begin{align}\begin{aligned}\vx_{t+1} &= \texttt{primal_optimizer_update} \left( \vx_{t}, \nabla_{\vx} \Lag_{\text{primal}} \left(\vx, \vlambda_{\color{red} t+\frac{1}{2}}, \vmu_{\color{red} t+\frac{1}{2}} \right)|_{\vx=\vx_{\color{red} t+\frac{1}{2}}} \right)\\\vlambda_{t+1} &= \left[ \texttt{dual_optimizer_update} \left( \vlambda_{t}, \nabla_{\vlambda} \Lag_{\text{dual}} \left({\vx_{\color{red} t+\frac{1}{2}}}, \vlambda, \vmu_{\color{red} t+\frac{1}{2}} \right) |_{\vlambda=\vlambda_{\color{red} t+\frac{1}{2}}}\right) \right]_+\\\vmu_{t+1} &= \texttt{dual_optimizer_update} \left( \vmu_{t}, \nabla_{\vmu} \Lag_{\text{dual}}\left({\vx_{\color{red} t+\frac{1}{2}}}, \vlambda_{\color{red} t+\frac{1}{2}}, \vmu \right) |_{\vmu=\vmu_{\color{red} t+\frac{1}{2}}} \right).\end{aligned}\end{align} \]For example, if the primal optimizer is gradient descent and the dual optimizer is gradient ascent, the extrapolation step leads to:
\[ \begin{align}\begin{aligned}\vx_{t+\frac{1}{2}} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f(\vx_t) + \vlambda_t^\top \nabla_{\vx} \vg(\vx_t) + \vmu_t^\top \nabla_{\vx} \vh(\vx_t) \right ],\\\vlambda_{t+\frac{1}{2}} &= \left [ \vlambda_t + \eta_{\vlambda} \vg(\vx_{t}) \right ]_+,\\\vmu_{t+\frac{1}{2}} &= \vmu_t + \eta_{\vmu} \vh(\vx_t).\end{aligned}\end{align} \]The update step then yields:
\[ \begin{align}\begin{aligned}\vx_{t+1} &= \vx_t - \eta_{\vx} \left [ \nabla_{\vx} f \left(\vx_{\color{red} t+\frac{1}{2}}\right) + \vlambda_{\color{red} t+\frac{1}{2}}^\top \nabla_{\vx} \vg \left(\vx_{\color{red} t+\frac{1}{2}} \right) + \vmu_{\color{red} t+\frac{1}{2}}^\top \nabla_{\vx} \vh\left(\vx_{\color{red} t+\frac{1}{2}} \right) \right ],\\\vlambda_{t+1} &= \left [ \vlambda_{t+\frac{1}{2}} + \eta_{\vlambda} \vg(\vx_{\color{red} t+\frac{1}{2}}) \right ]_+,\\\vmu_{t+1} &= \vmu_{t+\frac{1}{2}} + \eta_{\vmu} \vh(\vx_{\color{red} t+\frac{1}{2}}).\end{aligned}\end{align} \]The
roll()will simultaneously call theextrapolation()andstep()methods of the primal and dual optimizers.- custom_sanity_checks()[source]
Perform custom sanity checks on the initialization of the optimizer.
- Raises:
RuntimeError – Tried to construct an
ExtrapolationConstrainedOptimizerbut some of the provided optimizers do not have an extrapolation method.- Return type:
- primal_extrapolation_step()[source]
Perform an extrapolation step on the parameters associated with the primal variables.
- Return type:
- dual_extrapolation_step()[source]
Perform an extrapolation step on the parameters associated with the dual variables.
After being updated by the dual optimizer steps, the multipliers are post-processed (e.g. to ensure non-negativity for inequality constraints).
- Return type:
- roll(compute_cmp_state_kwargs=None)[source]
Performs a full update step on the primal and dual variables.
Note that the forward and backward computations are carried out twice, as part of the
extrapolation()andstep()calls.- Parameters:
compute_cmp_state_kwargs (
Optional[dict]) – Keyword arguments to pass to thecompute_cmp_state()method.- Returns:
A named tuple containing the following objects:
- loss (
Tensor): The loss value computed after the extrapolation step \(f(\vx_{t})\).
- loss (
- cmp_state (
CMPState): The CMP state at \(\vx_{t}\).
- cmp_state (
- primal_lagrangian_store (
LagrangianStore): The primal Lagrangian store at \(\vx_{t}\), \(\vlambda_{t}\) and \(\vmu_{t}\).
- primal_lagrangian_store (
- dual_lagrangian_store (
LagrangianStore): The dual Lagrangian store at \(\vx_{t}\), \(\vlambda_t\) and \(\vmu_t\).
- dual_lagrangian_store (
- Return type:
Note
The RollOut for this scheme returns the loss and CMPState values at the original point \((\vx_t, \vlambda_t)\), before any of the updates are performed.
Unconstrained Optimizers
The UnconstrainedOptimizer class provides an interface based on the roll() method for parameter updates in unconstrained minimization problems. This class is implemented to maintain consistency with the ConstrainedOptimizer class.
The roll() method of the UnconstrainedOptimizer class performs the following steps:
Zeroing Gradients: Calls
primal_optimizer.zero_grad().Forward Computation:
Computes the problem’s
CMPStateby invokingcompute_cmp_state().Calculates the primal Lagrangian.
Backward Propagation: Calls
backward()on the Lagrangian term.Optimization Step: Invokes
step()on the primal optimizer.
Example
To solve a ConstrainedMinimizationProblem using a QuadraticPenalty formulation, follow these steps:
import torch
import cooper
train_loader = ... # PyTorch DataLoader
model = ... # PyTorch model
cmp = ... # containing `Constraint`s and their associated `PenaltyCoefficient`s
primal_optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
unconstrained_optimizer = cooper.UnconstrainedOptimizer(
cmp=cmp,
primal_optimizer=primal_optimizer,
)
for inputs, targets in train_loader:
# kwargs used by `cmp.compute_cmp_state` method to compute the loss and constraints.
kwargs = {"model": model, "inputs": inputs, "targets": targets}
unconstrained_optimizer.roll(compute_cmp_state_kwargs=kwargs)
- class cooper.optim.UnconstrainedOptimizer(cmp, primal_optimizers, dual_optimizers=None)[source]
Wraps a (sequence of)
torch.optim.Optimizers to enable handling unconstrained minimization problems in a way that is consistent withConstrainedOptimizers.- Parameters:
cmp (
ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call thecompute_cmp_state()method within theroll()method.primal_optimizers (
Union[Optimizer,Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this caseprimal_optimizersaccepts a list oftorch.optim.Optimizers.
- roll(compute_cmp_state_kwargs=None)[source]
Evaluates the objective function and performs a gradient update on the parameters.
- Parameters:
compute_cmp_state_kwargs (
Optional[dict]) – Keyword arguments to pass to thecompute_cmp_state()method. Since this is an unconstrained optimizer, the CMPState will just contain the loss.- Return type:
Cooper Optimizer Base Class
CooperOptimizer is the base class for all Cooper optimizers, offering a unified interface for parameter updates. Both ConstrainedOptimizer and UnconstrainedOptimizer inherit from this class.
- class cooper.optim.CooperOptimizer(cmp, primal_optimizers, dual_optimizers=None)[source]
Base class for
ConstrainedOptimizerandUnconstrainedOptimizers.- Parameters:
cmp (
ConstrainedMinimizationProblem) – The constrained minimization problem to be optimized. Providing the CMP as an argument for the constructor allows the optimizer to call thecompute_cmp_state()method within theroll()method. Additionally, in the case of a constrained optimizer, the CMP enables access to the multipliers’post_step_()method which must be called after the multiplier update.primal_optimizers (
Union[Optimizer,Sequence[Optimizer]]) – Optimizer(s) for the primal variables (e.g. the weights of a model). The primal parameters can be partitioned into multiple optimizers, in this caseprimal_optimizersaccepts a list oftorch.optim.Optimizers.dual_optimizers (
Union[Optimizer,Sequence[Optimizer],None]) – Optimizer(s) for the dual variables (e.g. the Lagrange multipliers associated with the constraints). A sequence oftorch.optim.Optimizers can be passed to handle the case of severalConstraints.
- zero_grad()[source]
Sets the gradients of all optimized
Parameters to zero. This includes both the primal and dual variables.- Return type:
- primal_step()[source]
Performs a gradient step on the parameters associated with the primal variables.
- Return type:
- state_dict()[source]
Returns the state of the optimizer as a
CooperOptimizerState. This method relies on the internalstate_dict()method of the corresponding primal or dual optimizers.- Return type:
- load_state_dict(state)[source]
Loads the optimizer state from the given state dictionary.
- Parameters:
state (
CooperOptimizerState) – A dictionary containing the optimizer state.- Raises:
ValueError – If the number of primal optimizers does not match the number of primal optimizer states.
ValueError – If the number of dual optimizers does not match the number of dual optimizer states.
ValueError – If
dual_optimizer_statesis present in the state dict butdual_optimizersis None.
- Return type:
Checkpointing
For convenience, if you checkpoint the state of a CooperOptimizer object, it automatically checkpoints the state of all associated primal and dual optimizers, packaged in a CooperOptimizerState object. For example, you can do the following:
# Save the state of the constrained optimizer
state_dict = constrained_optimizer.state_dict()
torch.save(state_dict, "checkpoint.pth")
# Load the state of the constrained optimizer
state_dict = torch.load("checkpoint.pth")
constrained_optimizer.load_state_dict(state_dict) # Automatically loads the state of the primal and dual optimizers
For a full working example, see this tutorial.
- class cooper.optim.CooperOptimizerState[source]
Stores the state of a
CooperOptimizer.- Parameters:
primal_optimizer_states – List of primal optimizer
state_dicts.dual_optimizer_states – List of dual optimizer
state_dicts. If the optimizer is an unconstrained optimizer, this field is set toNone.