Formulations

Once a constrained minimization problem (CMP) is defined:

\[ \min_{\vx \in \reals^d} \,\, f(\vx), \,\, \text{s.t. } \,\, \vg(\vx) \le \vzero, \,\, \vh(\vx) = \vzero, \]

various algorithmic approaches can be used to find a solution. The process involves two stages: first, selecting a formulation for the optimization problem, and second, choosing the optimization algorithm to solve that formulated problem.

This section focuses on the formulations of the CMP. A formulation is a mathematical representation of the constrained optimization problem, involving a scalarization of the objective function and constraints (see Sec. 4.7.4 in Boyd and Vandenberghe [BV04]). Examples of formulations include the Lagrangian and AugmentedLagrangian formulations.

Cooper supports min-max formulations of CMPs of the form:

\[ \min_{\vx \in \reals^d} \,\, \max_{\vlambda \ge \vzero, \vmu} \,\, \Lag(\vx,\vlambda, \vmu) \triangleq f(\vx) + P(\vg(\vx), \vlambda, \vc_{\vg}) + Q(\vh(\vx), \vmu, \vc_{\vh}), \]

where \(P\) and \(Q\) are functions aimed at enforcing the satisfaction of the constraints.

In Optim, we discuss the second stage: the choice of algorithm used to solve a formulation, including variants of simultaneous gradient descent-ascent (SimultaneousOptimizer).

Multipliers and Penalty Coefficients

Formulations may rely on the introduction of additional decision variables or hyper-parameters aimed at enforcing the constraints. The decision variables \(\vlambda \geq \vzero\) and \(\vmu\) are referred to as Lagrange multipliers or dual variables. The hyper-parameters \(\vc_{\vg} \geq \vzero\) and \(\vc_{\vh} \geq \vzero\), are known as penalty coefficients.

For more details, see Multipliers.

Cooper’s framework for formulations supports a wide range of approaches for solving constrained optimization problems, including the Lagrangian, QuadraticPenalty, and AugmentedLagrangian formulations. For other formulations that could be implemented in future versions of Cooper, see Contributing to Cooper.

Warning

Sequential Quadratic Programming (SQP) is not supported under Cooper’s formulation framework.

Example

To specify your formulation of choice, pass the corresponding class to the formulation_type argument of the Constraint class. For example, to use the Lagrangian formulation, you would first define a multiplier (see Multipliers for more details):

from cooper.multipliers import DenseMultiplier

multiplier = DenseMultiplier(num_constraints=...)

Then, pass the Lagrangian class and the multiplier to the Constraint constructor:

my_constraint = cooper.Constraint(
    constraint_type=cooper.ConstraintType.INEQUALITY,
    multiplier=multiplier,
    formulation=cooper.formulations.Lagrangian
)

If you consider a formulation that requires penalty coefficients, you can first instantiate a PenaltyCoefficient:

from cooper.penalty_coefficients import DensePenaltyCoefficient

penalty_coefficient = DensePenaltyCoefficient(init=torch.tensor(...))

And then pass the desired formulation and the penalty coefficient to the Constraint constructor:

my_constraint = cooper.Constraint(
    constraint_type=cooper.ConstraintType.INEQUALITY,
    formulation=cooper.formulations.QuadraticPenalty,
    penalty_coefficient=penalty_coefficient,
)

Some formulations may require both multipliers and penalty coefficients. In such cases, you can pass both to the Constraint constructor.

Note

Different formulations may be better suited to different constraints. Cooper enables specifying a formulation type for each constraint separately, allowing the use of multiple formulations within the same constrained optimization problem.

Primal and Dual Lagrangians

To support Non-differentiable Constraints, Cooper formulations internally calculate two Lagrangian terms, \(\Lag_{\text{primal}}\) and \(\Lag_{\text{dual}}\). The primal term, \(\Lag_{\text{primal}}\), considers differentiable constraint violations \(\tilde{\vg}(\vx)\) and \(\tilde{\vh}(\vx)\), while the dual term, \(\Lag_{\text{dual}}\), considers the (potentially non-differentiable) true constraint violations \(\vg(\vx)\) and \(\vh(\vx)\). For details on how Cooper represents these constraint violations, see ConstraintStates.

Formulations in Cooper

Lagrangian Formulations

The popular Lagrangian formulation [AHU58] corresponds to the choices:

\[P(\vg(\vx), \vlambda, \vc_{\vg}) = \vlambda^\top \vg(\vx),\]

\[Q(\vh(\vx), \vmu, \vc_{\vh}) = \vmu^\top \vh(\vx).\]

The Lagrangian function is given by:

\[ \Lag(\vx, \vlambda, \vmu) \triangleq f(\vx) + \vlambda^\top \vg(\vx) + \vmu^\top \vh(\vx), \]

which corresponds to a linear combination of the objective function and the constraints, with the Lagrange multipliers acting as trainable weights on the constraints.

The maximization of the Lagrangian over \(\vlambda\) and \(\vmu\) enforces the constraints by assigning an objective value of \(\infty\) to infeasible points:

\[\begin{split} \min_{\vx \in \reals^d} \,\, \max_{\vlambda \ge \vzero, \vmu} \,\, \Lag(\vx, \vlambda, \vmu) = \min_{\vx \in \reals^d} \,\, \begin{cases} f(\vx), & \text{if } \vg(\vx) \le \vzero, \vh(\vx) = \vzero, \\ \infty, & \text{otherwise}. \end{cases} \end{split}\]

Warning

There is no guarantee that a general nonconvex constrained optimization problem admits optimal Lagrange multipliers \(\lambdastar\) and \(\mustar\) at a solution \(\xstar\). Nevertheless, in practice, many non-convex problems are successfully solved using the Lagrangian approach [DPS+24, ENR22, GPRE+22, HCR23, SRZ+24] For conditions under which Lagrange multipliers are guaranteed to exist, see Boyd and Vandenberghe [BV04].

class cooper.formulations.Lagrangian(constraint_type)[source]

The Lagrangian formulation implements the following primal Lagrangian:

\[\Lag_{\text{primal}}(\vx, \vlambda, \vmu) = f(\vx) + \vlambda^{\top} \tilde{\vg}(\vx) + \vmu^{\top} \tilde{\vh}(\vx).\]

And the following dual Lagrangian:

\[\Lag_{\text{dual}}(\vx, \vlambda, \vmu) = \vlambda^{\top} \vg(\vx) + \vmu^{\top} \vh(\vx).\]

compute_contribution_to_primal_lagrangian(constraint_state, multiplier)[source]

Computes the contribution of a given constraint violation to the primal Lagrangian.

Returns None if the constraint does not contribute to the primal update (i.e., when ConstraintState.contributes_to_primal_update=False).

Return type:: Optional[ContributionStore]

compute_contribution_to_dual_lagrangian(constraint_state, multiplier)[source]

Computes the contribution of a given constraint violation to the dual Lagrangian.

Returns None if the constraint does not contribute to the dual update (i.e., when ConstraintState.contributes_to_dual_update=False).

Return type:: Optional[ContributionStore]

Quadratic Penalty Formulations

The Quadratic Penalty formulation (see Sec. 17.1 in [NW06]) penalizes constraint violations via quadratic terms:

\[P(\vg(\vx), \vlambda, \vc_{\vg}) = \frac{1}{2} \vc_{\vg}^\top \, \texttt{relu}(\vg(\vx))^2,\]

\[Q(\vh(\vx), \vmu, \vc_{\vh}) = \frac{1}{2} \vc_{\vh}^\top \, \vh(\vx)^2,\]

where \(\texttt{relu}(\cdot)\) and \((\cdot)^2\) are element-wise operations. This results in the Quadratic Penalty function:

\[\Lag^{\text{QP}}_{\vc_g, \vc_h}(\vx) = f(\vx) + \frac{1}{2} \vc_{\vg}^\top \, \texttt{relu}(\vg(\vx))^2 + \frac{1}{2} \vc_{\vh}^\top \, \vh(\vx)^2.\]

The penalty coefficients are used to penalize constraint violations. As \(\vc_{\vg}, \vc_{\vh} \rightarrow \infty\), a solution to the quadratic penalty problem approaches a solution to the original constrained optimization problem:

\[\begin{split} \lim_{\vc_{\vg}, \vc_{\vh} \rightarrow \infty} \,\, \Lag^{\text{QP}}_{\vc_g, \vc_h}(\vx) = \,\, \begin{cases} f(\vx), & \text{if } \vg(\vx) \le \vzero, \vh(\vx) = \vzero, \\ \infty, & \text{otherwise}. \end{cases} \end{split}\]

The Quadratic Penalty formulation and Proxy Constraints

Cooper enables the use of Proxy Constraints together with the QuadraticPenalty formulation. This approach is similar to the Lagrangian formulation with proxy constraints, where a differentiable surrogate is used for primal optimization, while the true non-differentiable constraint guides the update of dual variables. In this case, the surrogate Quadratic Penalty function incorporates the surrogate, while the true constraint determines whether to update the penalty coefficient (see Penalty Coefficient Updaters).

This approach carries some risk: if the surrogate and true constraint are misaligned, feasibility with respect to the true non-differentiable constraint may not be achievable, as optimization is performed on the surrogate. In this case, since the penalty coefficient is updated based on the unsatisfied true constraint, it may grow indefinitely rather than converging to a finite value, leading to numerical issues. To mitigate this risk, consider adding extra tolerance to the penalty coefficient updates.

class cooper.formulations.QuadraticPenalty(constraint_type)[source]

The Quadratic Penalty formulation implements the following primal Lagrangian:

\[\Lag_{\text{primal}}(\vx) = f(\vx) + \frac{1}{2} \vc_{\vg}^\top \, \texttt{relu}(\tilde{\vg}(\vx))^2 + \frac{1}{2} \vc_{\vh}^\top \, \tilde{\vh}(\vx)^2.\]

It does not implement a dual Lagrangian since it does not consider dual variables.

compute_contribution_to_primal_lagrangian(constraint_state, penalty_coefficient)[source]

Computes the contribution of a given constraint violation to the primal Lagrangian.

Returns None if the constraint does not contribute to the primal update (i.e., when ConstraintState.contributes_to_primal_update=False).

Return type:: Optional[ContributionStore]

compute_contribution_to_dual_lagrangian(constraint_state, penalty_coefficient)[source]

The Quadratic Penalty formulation does not involve dual variables and therefore does not implement a dual Lagrangian (returns None).

Return type:: None

Augmented Lagrangian Formulations

The Augmented Lagrangian function [Ber75] is a generalization of the Lagrangian function that includes a quadratic penalty term on the constraint violations:

\[ \Lag_{\vc_g, \vc_h}(\vx, \vlambda, \vmu) = \Lag(\vx, \vlambda, \vmu) + \frac{1}{2} \vc_{\vg}^\top \, \texttt{relu}(\vg(\vx))^2 + \frac{1}{2} \vc_{\vh}^\top \, \vh(\vx)^2. \]

The main advantage of the Augmented Lagrangian Method (ALM) over the quadratic penalty method (see Theorem 17.5 in Nocedal and Wright [NW06]) is that, under certain assumptions, there exists a finite \(\bar{c}\) such that for all \(c \geq \bar{c}\), the minimizer \(\xstar\) of \(\Lag_{c \, \odot \, \mathbf{1}, c\, \odot \, \mathbf{1}}(\vx, \vlambda, \vmu)\) with respect to \(\vx\) corresponds to the solution of the original constrained optimization problem. This allows the algorithm to succeed without requiring \(c \rightarrow \infty\), as is often the case with the quadratic penalty method.

Exactly solving the intermediate optimization problems considered by the AugmentedLagrangian formulation can be challenging. This is particularly true when \(\Lag_{\vc_g, \vc_h}(\vx, \vlambda, \vmu)\) is non-convex in \(\vx\).

Rather than aiming to solve the intermediate optimization problem to high precision, Cooper implements a version of the Augmented Lagrangian method where the primal variables are updated through one or more gradient-based steps.

The Augmented Lagrangian formulation and Proxy Constraints

Cooper supports the use of Proxy Constraints with the AugmentedLagrangian formulation. Like the Lagrangian formulation, a differentiable surrogate is used for primal optimization, while the true non-differentiable constraint is used for updating the dual variables. Moreover, similar to the QuadraticPenalty formulation, penalty coefficient updates are based on the value of the non-differentiable constraint when using a penalty coefficient updater.

Warning

We make a distinction between the Augmented Lagrangian formulation implemented in Cooper (AugmentedLagrangian) and the Augmented Lagrangian method (\(\S\)4.2.1 in Bertsekas [Ber99]). The Augmented Lagrangian method is a specific optimization algorithm over the Augmented Lagrangian function \(\Lag_{\vc_g, \vc_h}(\vx, \vlambda, \vmu)\), with the following updates:

\[\begin{split} \vx_{t+1} &\in \argmin{\vx \in \reals^d} \,\, \Lag_{c_t}(\vx, \vlambda_t, \vmu_t) \\ \vlambda_{t+1} &= \left[\vlambda_t + {\color{red} c_t} \, \vg(\vx_{\color{red} t+1})\right]_+ \\ \vmu_{t+1} &= \vmu_t + {\color{red} c_t} \, \vh(\vx_{\color{red} t+1}) \\ c_{t+1} &\ge c_t \end{split}\]

The Augmented Lagrangian method has the following distinguishing features:

The minimization with respect to the primal variables \(\vx\) is (usually) solved completely or approximately (in contrast to taking one gradient step).
It uses alternating updates (the updated primal iterate \(\vx_{t+1}\) is used to obtain the updated Lagrange multipliers \(\vlambda_{t+1}\)).
The dual learning rates match the current value of the penalty coefficient \(\eta_{\vlambda} = \eta_{\vmu} = c_t\).

If you want to use the Augmented Lagrangian formulation in Cooper, use the AugmentedLagrangian formulation. If you intend to use the Augmented Lagrangian Method, follow these steps:

Use an AugmentedLagrangian formulation. This formulation automatically ensures that the dual learning rate is multiplied by the current value of the penalty coefficient.
Use torch.optim.SGD as the dual optimizer, and ensure that its learning rate corresponds to the penalty coefficient on each step c_t.
Use AlternatingPrimalDualOptimizer as the constrained optimizer to obtain alternating updates which first update the primal variables and then the dual variables.
[Optional] Instead of carrying out a single step of primal optimization for every step of dual optimization, you can carry out multiple primal steps for every dual step, more closely approximating the full minimization of the Augmented Lagrangian function. See Optim for details on how to implement this.

class cooper.formulations.AugmentedLagrangian(constraint_type)[source]

The Augmented Lagrangian formulation implements the following primal Lagrangian:

\[\Lag_{\text{primal}}(\vx, \vlambda, \vmu) = f(\vx) + \vlambda^{\top} \tilde{\vg}(\vx) + \vmu^{\top} \tilde{\vh}(\vx) + \frac{1}{2} \vc_{\vg}^\top \, \texttt{relu}(\tilde{\vg}(\vx))^2 + \frac{1}{2} \vc_{\vh}^\top \, \tilde{\vh}(\vx)^2.\]

And the following dual Lagrangian:

\[\Lag_{\text{dual}}(\vx, \vlambda, \vmu) = \vlambda^{\top} \vg(\vx) + \vmu^{\top} \vh(\vx).\]

compute_contribution_to_primal_lagrangian(constraint_state, multiplier, penalty_coefficient)[source]

Computes the contribution of a given constraint violation to the primal Lagrangian.

Returns None if the constraint does not contribute to the primal update (i.e., when ConstraintState.contributes_to_primal_update=False).

Return type:: Optional[ContributionStore]

compute_contribution_to_dual_lagrangian(constraint_state, multiplier, penalty_coefficient)[source]

Computes the contribution of a given constraint violation to the dual Lagrangian.

Returns None if the constraint does not contribute to the dual update (i.e., when ConstraintState.contributes_to_dual_update=False).

Return type:: Optional[ContributionStore]

Custom Formulations

If you are interested in implementing your own formulation, you can inherit from the Formulation abstract base class.

Base Formulation Class

class cooper.formulations.Formulation(constraint_type)[source]

Formulations prescribe how the different constraints contribute to the primal- and dual-differentiable Lagrangians. In other words, they prescribe how the constraints affect the gradients of the Lagrangian with respect to the primal and dual variables.

expects_multiplier

Used to determine whether the formulation requires a multiplier.

Type:: bool

expects_penalty_coefficient

Used to determine whether the formulation requires a penalty coefficient.

Type:: bool

Raises:: ValueError – If the constraint type is not equality or inequality.

abstract compute_contribution_to_primal_lagrangian(*args, **kwargs)[source]

Computes the contribution of a given constraint violation to the primal Lagrangian.

Returns None if the constraint does not contribute to the primal update (i.e., when ConstraintState.contributes_to_primal_update=False).

Return type:: Optional[ContributionStore]

abstract compute_contribution_to_dual_lagrangian(*args, **kwargs)[source]

Computes the contribution of a given constraint violation to the dual Lagrangian.

Returns None if the constraint does not contribute to the dual update (i.e., when ConstraintState.contributes_to_dual_update=False).

Return type:: Optional[ContributionStore]

Utils

Cooper provides utility functions to simplify the core computations of the primal and dual Lagrangians. These functions are used internally by the formulations and can be used to implement custom formulations.

cooper.formulations.utils.evaluate_constraint_factor(module, constraint_features, expand_shape)[source]

Evaluate a Lagrange multiplier or penalty coefficient.

If the module expects constraint features, it is called with the constraint features as an argument. Otherwise, it is called without arguments.

Parameters:

module (Union[Multiplier, PenaltyCoefficient]) – Multiplier or penalty coefficient module.
constraint_features (Optional[Tensor]) – The observed features of the constraint.
expand_shape (tuple[int, ...]) – Shape of the constraint violation tensor.

Return type:

Tensor

cooper.formulations.utils.compute_primal_weighted_violation(constraint_factor_value, violation)[source]

A weighted sum of constraint violations using their associated multipliers, preserving only the gradient for the primal variables \(\vx\). This corresponds to \(\vlambda .\texttt{detach}()^{\top} \vg(\vx)\) for inequality constraints or \(\vmu .\texttt{detach}()^{\top} \vh(\vx)\) for equality constraints.

Parameters:

constraint_factor_value (Tensor) – Tensor of constraint factor values.
violation (Tensor) – Tensor of constraint violations.

Return type:

Optional[Tensor]

cooper.formulations.utils.compute_dual_weighted_violation(multiplier_value, violation)[source]

Computes the sum of weighted constraint violations while preserving the gradient for the dual variables \(\vlambda\) and \(\vmu\) only. That is:

\[\vlambda^{\top} \vg(\vx).\texttt{detach}() \text{ or } \vmu^{\top} \vh(\vx).\texttt{detach}()\]

If a penalty coefficient is provided, the contribution of each violation is further multiplied by its associated penalty coefficient, ensuring that the gradient with respect to the multiplier is the constraint violation times the penalty coefficient. This results in:

\[(\vlambda \odot \vc_{\vg})^{\top} \vg(\vx) \text{ or } (\vmu \odot \vc_{\vh})^{\top} \vh(\vx)\]

Parameters:

multiplier_value (Tensor) – Tensor of multiplier values.
violation (Tensor) – Tensor of constraint violations.

Return type:

Tensor

cooper.formulations.utils.compute_quadratic_penalty(penalty_coefficient_value, violation, constraint_type)[source]

A weighted sum of squared constraint violations using their associated penalty coefficients.

We clamp the violations for inequality constraints as done in Eq 17.7 in Numerical Optimization by Nocedal and Wright [NW06]. This corresponds to:

\[\frac{1}{2} \, \vc_{\vg}^{\top} \texttt{relu}(\vg(\vx))^2 \text{ or } \frac{1}{2} \, \vc_{\vh}^{\top} \vh(\vx)^2\]

Parameters:

penalty_coefficient_value (Tensor) – Tensor of penalty coefficient values.
violation (Tensor) – Tensor of constraint violations.
constraint_type (ConstraintType) – Type of constraint. One of ConstraintType.INEQUALITY or ConstraintType.EQUALITY.

Return type:

Optional[Tensor]

cooper.formulations.utils.compute_primal_quadratic_augmented_contribution(multiplier_value, penalty_coefficient_value, violation, constraint_type)[source]

Computes the quadratic-augmented contribution of a constraint to the Lagrangian.

When the constraint is an inequality constraint, the quadratic penalty is computed following Eqs 17.64 and 17.65 in Numerical Optimization by Nocedal and Wright [NW06]. Note that Nocedal and Wright use a “greater-than-or-equal to zero” convention for their constraints, which reverses some of the signs below. Denoting the current multiplier by \(\lambda\) and the penalty coefficient by \(\rho\), we obtain the contribution of an inequality constraint to the augmented Lagrangian:

\[\lambda_{*}^{\top} \text{violation} - \frac{1}{2 \rho} ||\lambda_{*} - \lambda||_2^2,\]

where \(\lambda_{*}= \texttt{relu}(\lambda + \rho \text{violation})\). Note that this corresponds to the multiplier update after a step of projected gradient ascent.

In the case of equality constraints, the quadratic-augmented contribution is computed following Eq 17.36 in Nocedal and Wright [NW06]:

\[\lambda^{\top} \text{violation}+ \frac{rho}{2} ||violation||_2^2\]

Parameters:

multiplier_value (Tensor) – Tensor of multiplier values.
penalty_coefficient_value (Tensor) – Tensor of penalty coefficient values.
violation (Tensor) – Tensor of constraint violations.
constraint_type (ConstraintType) – Type of constraint. One of ConstraintType.INEQUALITY or ConstraintType.EQUALITY.

Return type:

Optional[Tensor]