Bibliography

[AHU58]

K. Arrow, L. Hurwicz, and H. Uzawa. Studies in Linear and Non-linear Programming. Stanford University Press, 1958. URL: https://academic.oup.com/jrsssa/article/122/3/381/7101919.

[AH95]

KJ Astrom and T Hagglund. PID controllers: Theory, Design and Tuning. International Society for Measurement and Control, 1995. URL: https://books.google.com/books?id=FsyhngEACAAJ.

[Ber75]

Dimitri P. Bertsekas. On the method of multipliers for convex programming. IEEE Transactions on Automatic Control, 1975. URL: https://ieeexplore.ieee.org/document/1100976.

[Ber99]

Dimitri P. Bertsekas. Nonlinear Programming. Athena scientific, Belmont, Mass, 2nd edition, 1999. URL: https://books.google.com/books?id=TgMpAQAAMAAJ.

[BV04]

Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. URL: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf.

[CJG+19]

Andrew Cotter, Heinrich Jiang, Maya Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals. JMLR, 2019. URL: http://jmlr.org/papers/v20/18-616.html.

[DPS+24]

Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. Safe RLHF: Safe Reinforcement Learning from Human Feedback. In ICLR. 2024. URL: https://arxiv.org/abs/2310.12773.

[ENR22]

Juan Elenter, Navid NaderiAlizadeh, and Alejandro Ribeiro. A Lagrangian Duality Approach to Active Learning. In NeurIPS. 2022. URL: https://arxiv.org/abs/2202.04108.

[GPRE+22]

Jose Gallego-Posada, Juan Ramirez, Akram Erraqabi, Yoshua Bengio, and Simon Lacoste-Julien. Controlled Sparsity via Constrained Optimization or: \textit How I Learned to Stop Tuning Penalties and Love Constraints. In NeurIPS. 2022. URL: https://arxiv.org/abs/2208.04425.

[GBV+19]

Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A Variational Inequality Perspective on Generative Adversarial Networks. In ICLR. 2019. URL: https://arxiv.org/abs/1802.10551.

[HCR23]

Ignacio Hounie, Luiz FO Chamon, and Alejandro Ribeiro. Automatic Data Augmentation via Invariance-Constrained Learning. In ICML. 2023. URL: https://arxiv.org/abs/2209.15031.

[Kor76]

Galina M Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 1976. URL: https://cs.uwaterloo.ca/~y328yu/classics/extragrad.pdf.

[NCZ+20]

Harikrishna Narasimhan, Andrew Cotter, Yichen Zhou, Serena Wang, and Wenshuo Guo. Approximate Heavily-Constrained Learning with Lagrange Multiplier Models. In NeurIPS. 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/62db9e3397c76207a687c360e0243317-Abstract.html.

[Nes83]

Yurii Evgen'evich Nesterov. A method of solving a convex programming problem with convergence rate O(1/k^2). Russian Academy of Sciences, 1983. URL: https://hengshuaiyao.github.io/papers/nesterov83.pdf.

[NW06]

Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer, New York, 2nd ed edition, 2006. URL: https://books.google.com/books?id=VbHYoSyelFcC.

[Pol64]

Boris T Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 1964. URL: https://papers.baulab.info/papers/also/Polyak-1964.pdf.

[Pop80]

Leonid Denisovich Popov. A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 1980. URL: https://link.springer.com/article/10.1007/BF01141092.

[RKK18]

Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the Convergence of Adam and Beyond. In ICLR. 2018. URL: https://arxiv.org/abs/1904.09237.

[SRZ+24]

Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, and Jose Gallego-Posada. On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization. In ICML. 2024. URL: https://arxiv.org/abs/2406.04558.

[SMDH13]

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In ICML. 2013. URL: https://proceedings.mlr.press/v28/sutskever13.html.

[ZWLG22]

Guodong Zhang, Yuanhao Wang, Laurent Lessard, and Roger B Grosse. Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization. In AISTATS. 2022. URL: https://arxiv.org/abs/2102.09468.

[ZARX18]

Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In NeurIPS. 2018. URL: https://arxiv.org/abs/1803.01422.