Constrained Minimization Problem

We consider constrained minimization problems (CMPs) expressed as:

\[\begin{split}\min_{x \in \Omega} & \,\, f(x) \\ \text{s.t. } & \,\, g(x) \le \mathbf{0} \\ & \,\, h(x) = \mathbf{0}\end{split}\]

Here \(\Omega\) represents the domain of definition of the functions \(f, g\) and \(h\). Note that \(f\) is a scalar-valued function. We group together all the inequality and equality constraints into the vector-valued mappings \(g\) and \(h\). In other words, a component function \(h_i(x)\) corresponds to the scalar constraint \(h_i(x) \le 0\).

Brief notes on conventions and terminology

We refer to \(f\) as the loss or main objective to be minimized.
Many authors prefer making the constraint levels explicit (e.g. \(g(x) \le \mathbf{\epsilon}\)). To improve the readability of the code, we adopt the convention that the constraint levels have been “absorbed” in the definition of the functions \(g\) and \(h\).
Based on this convention, we use the terms defect and constraint violation interchangeably to denote the quantities \(g(x)\) or \(h(x)\). Note that equality constraints \(h(x)\) are satisfied only when their defect is zero. On the other hand, a negative defect for an inequality constraint \(g(x)\) means that the constraint is strictly satisfied; while a positive defect means that the inequality constraint is being violated.

CMP State

We represent computationally the “state” of a CMP using a CMPState object. A CMPState is a dataclasses.dataclass which contains the information about the loss and equality/inequality violations at a given point \(x\). If a problem has no equality or inequality constraints, these arguments can be omitted in the creation of the CMPState.

Stochastic estimates in CMPState

In problems for which computing the loss or constraints exactly is prohibitively expensive, the CMPState may contain stochastic estimates of the loss/constraints. For example, this is the case when the loss corresponds to a sum over a large number of terms, such as training examples. In this case, the loss and constraints may to be estimated using a mini-batch of data.

Note that, just as in the unconstrained case, these approximations can entail a compromise in the stability of the optimization process.

class cooper.problem.CMPState(loss=None, ineq_defect=None, eq_defect=None, proxy_ineq_defect=None, proxy_eq_defect=None, misc=None)[source]

Represents the “state” of a Constrained Minimization Problem in terms of the value of its loss and constraint violations/defects.

Parameters

loss (Optional[Tensor]) – Value of the loss or main objective to be minimized \(f(x)\)
ineq_defect (Optional[Tensor]) – Violation of the inequality constraints \(g(x)\)
eq_defect (Optional[Tensor]) – Violation of the equality constraints \(h(x)\)
proxy_ineq_defect (Optional[Tensor]) – Differentiable surrogate for the inequality constraints as proposed by Cotter et al. [2019].
proxy_eq_defect (Optional[Tensor]) – Differentiable surrogate for the equality constraints as proposed by Cotter et al. [2019].
misc (Optional[dict]) – Optional additional information to be store along with the state of the CMP

For details on the use of proxy-constraints and the proxy_ineq_defect and proxy_eq_defect attributes, please see Lagrangian Formulations.

Constrained Minimization Problem

class cooper.problem.ConstrainedMinimizationProblem(is_constrained=False)[source]

Base class for constrained minimization problems.

Parameters: is_constrained (bool) – We request the problem to be explicitly declared as constrained or unconstrained to perform sanity checks when initializing the ConstrainedOptimizer. Defaults to False.

abstract closure()[source]

Computes the state of the CMP based on the current value of the primal parameters.

The signature of this abstract function may be change to accommodate situations that require a model, (mini-batched) inputs/targets, or other arguments to be passed.

Structuring the CMP class around this closure method, enables the re-use of shared sections of a computational graph. For example, consider a case where we want to minimize a model’s cross entropy loss subject to a constraint on the entropy of its predictions. Both of these quantities depend on the predicted logits (on a minibatch). This closure-centric design allows flexible problem specifications while avoiding re-computation.

Return type: CMPState

Example

The example below illustrates the main steps that need to be carried out to define a ConstrainedMinimizationProblem in Cooper.

[Line 4] Define a custom class which inherits from ConstrainedMinimizationProblem.
[Line 10] Write a closure function that computes the loss and constraints.
[Line 29] Return the information about the loss and constraints packaged into a CMPState.

import torch
import cooper

class MyCustomCMP(cooper.ConstrainedMinimizationProblem):
    def __init__(self, problem_attributes, criterion):
        self.problem_attributes = problem_attributes
        self.criterion = criterion
        super().__init__(is_constrained=True)

    def closure(self, model, inputs, targets):

        const_level1, const_level2 = self.problem_attributes

        logits = model.forward(inputs)
        loss = self.criterion(logits, targets)

        # Remember to write the constraints using the convention "g <= 0"!

        # (Greater than) Inequality that only depends on the model properties or parameters
        # g_1 >= const_level1 --> const_level1 - g_1 <= 0
        defect1 = const_level1 - ineq_const1(model)

        # (Less than) Inequality that depends on the model's predictions
        # g_2 <= const_level2 --> g_2  - const_level2 <= 0
        defect2 =  ineq_const2(logits) - const_level2

        # We recommend using torch.stack to ensure the dependencies in the computational
        # graph are properly preserved.
        ineq_defect = torch.stack([defect1, defect2])

        return cooper.CMPState(loss=entropy, ineq_defect=ineq_defect, eq_defect=None)

Warning

Cooper is primarily oriented towards non-convex CMPs that arise in many machine/deep learning settings. That is, problems for which one of the functions \(f, g, h\) or the set \(\Omega\) is non-convex.

Whenever possible, we provide references to appropriate literature describing convergence results for our implemented (under suitable assumptions). In general, however, the use of Lagrangian-based approaches for solving non-convex CMPs does not come with guarantees regarding optimality or feasibility.

Some theoretical results can be obtained when considering mixed strategies (distributions over actions for the primal and dual players), or by relaxing the game-theoretic solution concept (i.e. aiming for approximate/correlated equilibria), even for problems which are non-convex on the primal (model) parameters. For more details, see the work of Cotter et al. [2019] and Lin et al. [2020] and references therein. We plan to include some of these techniques in future versions of Cooper.

If you are dealing with optimization problems under “nicely behaved” convex constraints (e.g. cones or \(L_p\)-balls) we encourage you to check out CHOP. If your problems involves “manifold” constraints (e.g. orthogonal or PSD matrices), you might consider using GeoTorch.

Formulation

Formulations denote mathematical or algorithmic techniques aimed at solving a specific (family of) CMP. Cooper is heavily (but not exclusively!) designed for an easy integration of Lagrangian-based formulations. You can find more details in Lagrangian Formulations.

class cooper.problem.Formulation[source]

Base class for Lagrangian and proxy-Lagrangian formulations.

abstract create_state()[source]: Initializes the internal state of the formulation.

custom_backward(*args, **kwargs)[source]

Alias for _populate_gradients() to keep the backward naming convention used in Pytorch. We avoid naming this method backward as it is a method of the LagrangianFormulation object and not that of a torch.Tensor as is usual in Pytorch.

Parameters: lagrangian – Value of the computed Lagrangian based on which the gradients for the primal and dual variables are populated.

abstract property dual_parameters: Returns the trainable parameters for the dual variables. Depending on the formulation, these dual parameters can represent the multipliers themselves, or a model which “learns” multiplier values.

abstract property is_state_created: Returns True if the internal state has been created.

abstract state()[source]: Returns the internal state of formulation (e.g. multipliers).