Lagrangian Formulations
Once equipped with a ConstrainedMinimizationProblem
,
several algorithmic approaches can be adopted for finding an approximation to
the solution of the constrained problem.
Recall that we consider constrained minimization problems (CMPs) expressed as:
Lagrangian Formulation
The Lagrangian problem associated with the CMP above is given by:
The vectors \({\lambda^g}\) and \({\lambda^h}\) are called the Lagrange multipliers or dual variables associated with the CMP. Observe that \(\mathcal{L}(x,\lambda)\) is a concave function of \(\lambda\) regardless of the convexity properties of \(f, g\) and \(h\).
A pair \((x^*,\lambda^*)\) is called a saddle-point of \(\mathcal{L}(x,\lambda)\) if for all \((x,\lambda)\),
This approach can be interpreted as a zero-sum two-player game, where the “primal” player \(x\) aims to minimize \(\mathcal{L}(x,\lambda)\) and the goal of the “dual” player \(\lambda\) is to maximize \(\mathcal{L}(x,\lambda)\) (or equiv. minimize \(-\mathcal{L}(x,\lambda)\)).
Note that the notion of a saddle-point of the Lagrangian is in fact equivalent to that of a (pure) Nash equilibrium of the zero-sum game. If \((x^*,\lambda^*)\) is a saddle-point of \(\mathcal{L}(x,\lambda)\), then, by definition, neither of the two players can improve their payoffs by unilaterally deviating from \((x^*,\lambda^*)\).
In the context of a convex CMP (convex objectives, constraints and \(\Omega\)), given certain technical conditions (e.g. Slater’s condition (see \(\S\) 5.2.3 in [Boyd and Vandenberghe, 2004]), or compactness of the domains), the existence of a pure Nash equilibrium is guaranteed [von Neumann, 1928].
Warning
A constrained non-convex problem might have an optimal feasible solution, and yet its Lagrangian might not have a pure Nash equilibrium. See example in Fig 1. of Cotter et al. [2019].
- class cooper.formulation.lagrangian.LagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]
Provides utilities for computing the Lagrangian associated with a
ConstrainedMinimizationProblem
and for populating the gradients for the primal and dual parameters.- Parameters
cmp (
Optional
[ConstrainedMinimizationProblem
]) –ConstrainedMinimizationProblem
we aim to solve and which gives rise to the Lagrangian.ineq_init (
Optional
[Tensor
]) – Initialization values for the inequality multipliers.eq_init (
Optional
[Tensor
]) – Initialization values for the equality multipliers.
- backward(lagrangian, ignore_primal=False, ignore_dual=False)[source]
Performs the actual backward computation which populates the gradients for the primal and dual variables.
- Parameters
lagrangian – Value of the computed Lagrangian based on which the gradients for the primal and dual variables are populated.
ignore_primal – If
True
, only the gradients with respect to the dual variables are populated (these correspond to the constraint violations). This feature is mainly used in conjunction withalternating
updates, which require updating the multipliers based on the constraints violation after having updated the primal parameters. Defaults to False.ignore_dual – If
True
, the gradients with respect to the dual variables are not populated.
- compute_lagrangian(closure=None, *closure_args, pre_computed_state=None, write_state=True, **closure_kwargs)[source]
Computes the Lagrangian based on a new evaluation of the
CMPState
via theclosure
function.If no explicit proxy-constraints are provided, we use the given inequality/equality constraints to compute the Lagrangian and to populate the primal and dual gradients. Note that gradients are _not_ populated by this function, but rather
backward()
.In case proxy constraints are provided in the CMPState, the non-proxy constraints (potentially non-differentiable) are used for computing the value of the Lagrangian. The accumulated proxy-constraints are used in the backward computation triggered by
backward()
(and thus must be differentiable).- Parameters
closure – Callable returning a
cooper.problem.CMPState
pre_computed_state – Pre-computed CMP state to avoid wasteful computation when only dual gradients are required.
write_state – If
True
, thestate
of the formulation’scooper.problem.ConstrainedMinimizationProblem
attribute is replaced by that returned by theclosure
argument. This flag can be used (when set toFalse
) to evaluate the Lagrangian, e.g. for logging validation metrics, without overwritting the information stored in the formulation’scooper.problem.ConstrainedMinimizationProblem
.
- weighted_violation(cmp_state, constraint_type)[source]
Computes the dot product between the current multipliers and the constraint violations of type
constraint_type
. If proxy-constraints are provided in theCMPState
, the non-proxy (usually non-differentiable) constraints are used for computing the dot product, while the “proxy-constraint” dot products are accumulated underself.accumulated_violation_dot_prod
.
Augmented Lagrangian Formulation
The Augmented Lagrangian Method (ALM) considers a sequence of unconstrained minimization problems on the primal variables:
This problem is (approximately) minimized over the primal variables to obtain:
The found \(x^{t+1}\) is used to update the estimate for the Lagrange multiplier:
The main advantage of the ALM compared to the quadratic penalty method (see \(\S\) 4.2.1 in [Bertsekas, 1999]) is that (under some reasonable assumptions), the algorithm can be successful without requiring the unbounded increase of the penalty parameter sequence \(c^t\). The use of explicit estimates for the Lagrange multipliers contribute to avoiding the ill-conditioning that is inherent in the quadratic penalty method.
See \(\S\) 4.2.1 in [Bertsekas, 1999] and \(\S\) 17 in [Nocedal and Wright, 2006] for a comprehensive treatment of the Augmented Lagrangian method.
Important
Please visit this section for practical considerations on using the Augmented Lagrangian method in Cooper.
In particular, the sequence of penalty coefficients \(c_t\) is handled in Cooper as a scheduler on the dual learning rate.
- class cooper.formulation.augmented_lagrangian.AugmentedLagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]
Provides utilities for computing the Augmented Lagrangian associated with a
ConstrainedMinimizationProblem
and for populating the gradients for the primal and dual parameters accordingly.- Parameters
cmp (
Optional
[ConstrainedMinimizationProblem
]) –ConstrainedMinimizationProblem
we aim to solve and which gives rise to the Lagrangian.ineq_init (
Optional
[Tensor
]) – Initialization values for the inequality multipliers.eq_init (
Optional
[Tensor
]) – Initialization values for the equality multipliers.
- compute_lagrangian(aug_lag_coeff_scheduler, closure=None, *closure_args, pre_computed_state=None, write_state=True, **closure_kwargs)[source]
Computes the Lagrangian based on a new evaluation of the
CMPState
via theclosure
function.If no explicit proxy-constraints are provided, we use the given inequality/equality constraints to compute the Augmented Lagrangian and to populate the primal and dual gradients. Note that gradients are _not_ populated by this function, but rather
backward()
.In case proxy constraints are provided in the CMPState, the non-proxy constraints (potentially non-differentiable) are used for computing the value of the Augmented Lagrangian. The accumulated proxy-constraints are used in the backward computation triggered by
backward()
(and thus must be differentiable).- Parameters
closure – Callable returning a
cooper.problem.CMPState
pre_computed_state – Pre-computed CMP state to avoid wasteful computation when only dual gradients are required.
write_state – If
True
, thestate
of the formulation’scooper.problem.ConstrainedMinimizationProblem
attribute is replaced by that returned by theclosure
argument. This flag can be used (when set toFalse
) to evaluate the Augmented Lagrangian, e.g. for logging validation metrics, without overwritting the information stored in the formulation’scooper.problem.ConstrainedMinimizationProblem
.
- weighted_violation(cmp_state, constraint_type)[source]
Computes the dot product between the current multipliers and the constraint violations of type
constraint_type
. If proxy-constraints are provided in theCMPState
, the non-proxy (usually non-differentiable) constraints are used for computing the dot product, while the “proxy-constraint” dot products are stored underself.accumulated_violation_dot_prod
.If the
CMPState
contains proxy _inequality_ constraints, the filtering on whether the constraint is active for the calculation of the Augmented Lagrangian is done based on the value of the non-proxy constraints.
Proxy-Lagrangian Formulation
Base Lagrangian Formulation
- class cooper.formulation.lagrangian.BaseLagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]
Base class for Lagrangian Formulations.
- cmp
ConstrainedMinimizationProblem
we aim to solve and which gives rise to the Lagrangian.
- ineq_multipliers
Trainable
cooper.multipliers.DenseMultiplier
s associated with the inequality constraints.
- eq_multipliers
Trainable
cooper.multipliers.DenseMultiplier
s associated with the equality constraints.
- create_state(cmp_state)[source]
Initialize multipliers and optimizers given list of equality and inequality defects.
- Parameters
eq_defect – Defects for equality constraints
ineq_defect – Defects for inequality constraints.
- Return type
- create_state_from_metadata(dtype, device, ineq_size=None, eq_size=None)[source]
Initialize the formulation state based on the shape of the constraints.
- Return type
- property is_state_created
Returns
True
if any Lagrange multipliers have been initialized.
- state()[source]
Collects all dual variables and returns a tuple containing their
torch.Tensor
values. Note that the values are a different type from thecooper.multipliers.DenseMultiplier
objects.