Lagrangian Formulations

Once equipped with a ConstrainedMinimizationProblem, several algorithmic approaches can be adopted for finding an approximation to the solution of the constrained problem.

Recall that we consider constrained minimization problems (CMPs) expressed as:

\[\begin{split}\min_{x \in \Omega} & \,\, f(x) \\ \text{s.t. } & \,\, g(x) \le \mathbf{0} \\ & \,\, h(x) = \mathbf{0}\end{split}\]

Lagrangian Formulation

The Lagrangian problem associated with the CMP above is given by:

\[\min_{x \in \Omega} \max_{{\lambda^g} \ge 0, \, {\lambda^h}} \mathcal{L}(x,\lambda) \triangleq f(x) + {\lambda^g}^{\top} g(x) + {\lambda^h}^{\top} h(x)\]

The vectors \({\lambda^g}\) and \({\lambda^h}\) are called the Lagrange multipliers or dual variables associated with the CMP. Observe that \(\mathcal{L}(x,\lambda)\) is a concave function of \(\lambda\) regardless of the convexity properties of \(f, g\) and \(h\).

A pair \((x^*,\lambda^*)\) is called a saddle-point of \(\mathcal{L}(x,\lambda)\) if for all \((x,\lambda)\),

\[\mathcal{L}(x^*,\lambda) \le \mathcal{L}(x^*,\lambda^*) \le \mathcal{L}(x,\lambda^*).\]

This approach can be interpreted as a zero-sum two-player game, where the “primal” player \(x\) aims to minimize \(\mathcal{L}(x,\lambda)\) and the goal of the “dual” player \(\lambda\) is to maximize \(\mathcal{L}(x,\lambda)\) (or equiv. minimize \(-\mathcal{L}(x,\lambda)\)).

Note that the notion of a saddle-point of the Lagrangian is in fact equivalent to that of a (pure) Nash equilibrium of the zero-sum game. If \((x^*,\lambda^*)\) is a saddle-point of \(\mathcal{L}(x,\lambda)\), then, by definition, neither of the two players can improve their payoffs by unilaterally deviating from \((x^*,\lambda^*)\).

In the context of a convex CMP (convex objectives, constraints and \(\Omega\)), given certain technical conditions (e.g. Slater’s condition (see \(\S\) 5.2.3 in [Boyd and Vandenberghe, 2004]), or compactness of the domains), the existence of a pure Nash equilibrium is guaranteed [von Neumann, 1928].

Warning

A constrained non-convex problem might have an optimal feasible solution, and yet its Lagrangian might not have a pure Nash equilibrium. See example in Fig 1. of Cotter et al. [2019].

class cooper.formulation.lagrangian.LagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]

Provides utilities for computing the Lagrangian associated with a ConstrainedMinimizationProblem and for populating the gradients for the primal and dual parameters.

Parameters
backward(lagrangian, ignore_primal=False, ignore_dual=False)[source]

Performs the actual backward computation which populates the gradients for the primal and dual variables.

Parameters
  • lagrangian – Value of the computed Lagrangian based on which the gradients for the primal and dual variables are populated.

  • ignore_primal – If True, only the gradients with respect to the dual variables are populated (these correspond to the constraint violations). This feature is mainly used in conjunction with alternating updates, which require updating the multipliers based on the constraints violation after having updated the primal parameters. Defaults to False.

  • ignore_dual – If True, the gradients with respect to the dual variables are not populated.

compute_lagrangian(closure=None, *closure_args, pre_computed_state=None, write_state=True, **closure_kwargs)[source]

Computes the Lagrangian based on a new evaluation of the CMPState via the closure function.

If no explicit proxy-constraints are provided, we use the given inequality/equality constraints to compute the Lagrangian and to populate the primal and dual gradients. Note that gradients are _not_ populated by this function, but rather backward().

In case proxy constraints are provided in the CMPState, the non-proxy constraints (potentially non-differentiable) are used for computing the value of the Lagrangian. The accumulated proxy-constraints are used in the backward computation triggered by backward() (and thus must be differentiable).

Parameters
weighted_violation(cmp_state, constraint_type)[source]

Computes the dot product between the current multipliers and the constraint violations of type constraint_type. If proxy-constraints are provided in the CMPState, the non-proxy (usually non-differentiable) constraints are used for computing the dot product, while the “proxy-constraint” dot products are accumulated under self.accumulated_violation_dot_prod.

Parameters
  • cmp_state (CMPState) – current CMPState

  • constraint_type (str) – type of constrained to be used, e.g. “eq” or “ineq”.

Return type

Tensor

Augmented Lagrangian Formulation

The Augmented Lagrangian Method (ALM) considers a sequence of unconstrained minimization problems on the primal variables:

\[L_{c_t}(x, \lambda^t) \triangleq f(x) + \lambda_{g, t}^{\top} \, g(x) + \lambda_{h, t}^{\top} \, h(x) + \frac{c_t}{2} ||g(x) \odot \mathbf{1}_{g(x_t) \ge 0 \vee \lambda_{g, t} > 0}||^2 + \frac{c_t}{2} ||h(x_t)||^2\]

This problem is (approximately) minimized over the primal variables to obtain:

\[x^{t+1} = \arg \min_{x \in \Omega} \mathcal{L}_{c^t}(x, \lambda^t)\]

The found \(x^{t+1}\) is used to update the estimate for the Lagrange multiplier:

\[\begin{split}\begin{align} \lambda_{t+1}^g &= \left[\lambda_t^g + c^t g(x^{t+1}) \right]^+ \\ \lambda_{t+1}^h &= \lambda_t^h + c^t h(x^{t+1}) \end{align}\end{split}\]

The main advantage of the ALM compared to the quadratic penalty method (see \(\S\) 4.2.1 in [Bertsekas, 1999]) is that (under some reasonable assumptions), the algorithm can be successful without requiring the unbounded increase of the penalty parameter sequence \(c^t\). The use of explicit estimates for the Lagrange multipliers contribute to avoiding the ill-conditioning that is inherent in the quadratic penalty method.

See \(\S\) 4.2.1 in [Bertsekas, 1999] and \(\S\) 17 in [Nocedal and Wright, 2006] for a comprehensive treatment of the Augmented Lagrangian method.

Important

Please visit this section for practical considerations on using the Augmented Lagrangian method in Cooper.

In particular, the sequence of penalty coefficients \(c_t\) is handled in Cooper as a scheduler on the dual learning rate.

class cooper.formulation.augmented_lagrangian.AugmentedLagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]

Provides utilities for computing the Augmented Lagrangian associated with a ConstrainedMinimizationProblem and for populating the gradients for the primal and dual parameters accordingly.

Parameters
compute_lagrangian(aug_lag_coeff_scheduler, closure=None, *closure_args, pre_computed_state=None, write_state=True, **closure_kwargs)[source]

Computes the Lagrangian based on a new evaluation of the CMPState via the closure function.

If no explicit proxy-constraints are provided, we use the given inequality/equality constraints to compute the Augmented Lagrangian and to populate the primal and dual gradients. Note that gradients are _not_ populated by this function, but rather backward().

In case proxy constraints are provided in the CMPState, the non-proxy constraints (potentially non-differentiable) are used for computing the value of the Augmented Lagrangian. The accumulated proxy-constraints are used in the backward computation triggered by backward() (and thus must be differentiable).

Parameters
  • closure – Callable returning a cooper.problem.CMPState

  • pre_computed_state – Pre-computed CMP state to avoid wasteful computation when only dual gradients are required.

  • write_state – If True, the state of the formulation’s cooper.problem.ConstrainedMinimizationProblem attribute is replaced by that returned by the closure argument. This flag can be used (when set to False) to evaluate the Augmented Lagrangian, e.g. for logging validation metrics, without overwritting the information stored in the formulation’s cooper.problem.ConstrainedMinimizationProblem.

weighted_violation(cmp_state, constraint_type)[source]

Computes the dot product between the current multipliers and the constraint violations of type constraint_type. If proxy-constraints are provided in the CMPState, the non-proxy (usually non-differentiable) constraints are used for computing the dot product, while the “proxy-constraint” dot products are stored under self.accumulated_violation_dot_prod.

If the CMPState contains proxy _inequality_ constraints, the filtering on whether the constraint is active for the calculation of the Augmented Lagrangian is done based on the value of the non-proxy constraints.

Parameters
  • cmp_state (CMPState) – current CMPState.

  • constraint_type (str) – type of constrained to be used, e.g. “eq” or “ineq”.

Return type

Tensor

Proxy-Lagrangian Formulation

class cooper.formulation.lagrangian.ProxyLagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]

Placeholder class for the proxy-Lagrangian formulation proposed by Cotter et al. [2019].

Todo

Implement Proxy-Lagrangian formulation as described in Cotter et al. [2019]

Base Lagrangian Formulation

class cooper.formulation.lagrangian.BaseLagrangianFormulation(cmp=None, ineq_init=None, eq_init=None)[source]

Base class for Lagrangian Formulations.

cmp

ConstrainedMinimizationProblem we aim to solve and which gives rise to the Lagrangian.

ineq_multipliers

Trainable cooper.multipliers.DenseMultipliers associated with the inequality constraints.

eq_multipliers

Trainable cooper.multipliers.DenseMultipliers associated with the equality constraints.

create_state(cmp_state)[source]

Initialize multipliers and optimizers given list of equality and inequality defects.

Parameters
  • eq_defect – Defects for equality constraints

  • ineq_defect – Defects for inequality constraints.

Return type

None

create_state_from_metadata(dtype, device, ineq_size=None, eq_size=None)[source]

Initialize the formulation state based on the shape of the constraints.

Return type

None

property dual_parameters: List[Tensor]

Returns a list gathering all dual parameters.

Return type

List[Tensor]

property is_state_created

Returns True if any Lagrange multipliers have been initialized.

load_state_dict(state_dict)[source]

Loads the state dictionary of a Lagrangian formulation.

Parameters

state_dict (Dict[str, Any]) – state dictionary to be loaded.

state()[source]

Collects all dual variables and returns a tuple containing their torch.Tensor values. Note that the values are a different type from the cooper.multipliers.DenseMultiplier objects.

Return type

Tuple[Optional[Tensor]]

state_dict()[source]

Generates the state dictionary for a Lagrangian formulation.

Return type

Dict[str, Any]

update_accumulated_violation(update=None)[source]

Update the cumulative dot product between the constraint violations (aka defects) and the current multipliers.

Parameters

update (Optional[Tensor]) – Dot product between multipliers and constraints. If value is None, then self.accumulated_violation_dot_prod is set to None.

abstract weighted_violation(cmp_state, constraint_type)[source]

Abstract method for computing the weighted violation of a constraint.

Parameters
Return type

Tensor