# Notes: IEOR 160 / Nonlinear & Discrete Optimization

** Published:**

I took these notes for IEOR 160: Nonlinear and Discrete Optimization taught by Javad Lavaei.

# Midterm

## Optimality Conditions

### Univariate

For a function $f$, a stationary point $x^*$ has $f’(x^*) = 0$. If

- $f’‘(x^
*) < 0$, then $x^*$ is a strict local**maximizer**. - $f’‘(x^
*) > 0$, then $x^*$ is a strict local**minimizer**. - $f’‘(x^
*) = 0$, then consider further derivatives. If the *first nonzero*derivative occurs at an*odd*number of derivatives, then $x^*$ is a**saddle point**.

### Feasibility

Suppose $f$ is only defined on $[a, b]$. A stationary point $x^*$ is only feasible if $a \leq x^* \leq b$. To find the global optimum, compare the local optima with the boundaries $a$ and $b$.

### Multivariate

For a function $f(x)$ with $x \in \mathbb{R}^n$, stationary points $x^*$ have $\nabla f(x^*) = 0$. Compute the **Hessian** $H_f = \nabla^2 f$, where $(H_f)_{i, j} = \frac{\partial^2 f}{\partial x_i \partial x_j}$. If the Hessian is

**negative semi-definite**(NSD), then $x^*$ is a strict local**maximizer**.**positive semi-definite**(PSD), then $x^*$ is a strict local**minimizer**.**indefinite**(ID), then $x^*$ is a saddle point.

### Definiteness

There are several tests for the definiteness of the Hessian:

**Principle Minors**. Compute the determinants of each upper left submatrix. If all are positive, then the Hessian is PSD. Vice versa for NSD. If neither, ID.**Eigenvalues**. If all eigenvalues are positive, the Hessian is PSD. Vice versa for NSD, and ID if neither.**Pivots**. Reduce matrix to row-echelon form.**Pivots**are the first nonzero element in each row. All pivots should be positive/negative for PSD/NSD, respectively.

### Approximation

The **first-order approximation** of $f(x + \delta)$ is:

A **second order approximation** adds the Hessian:

$f(x + \delta) \approx f(x) + \nabla f(x)^T \delta + \frac{1}{2} \delta^T \nabla^2 f(x) \delta$

### Descent Directions

To identify a **descent direction** for a function $f$ at some point $\bar{x}$, compute the Taylor expansion about $\bar{x}$ and identify the dominant (/first non-trivial) term. Any vector $\delta x$ that makes the term negative is a valid descent direction. An alternative approach is to compute the eigenvalues; a negative eigenvalue corresponds to a descent direction. Vice versa for acsent directions.

### Coercive

A function $f$ is **coercive** if $\frac{f(x) \cdot x}{| x |} \to + \infty$ as $| x | \to \infty$. In other words, $f$ must approach $+\infty$ from *all* directions.

## Numerical Algorithms

### Golden Section

Consider a unimodal function $f(x)$ and an interval $x_1, x_3$. Select a point $x_2$ such that the two intervals $a$ and $b$ have ratio $\phi - 1$:$2 - \phi$, where $\phi = \frac{1 + \sqrt{5}}{2}$. Choose $x_4 = x_1 + (x_3 - x_2)$ to gaurantee $a + c = b$ and a reduction of the interval width by $\frac{1}{\phi} \approx 0.608$.

In general, for an approximation of $f$ over $[a, b]$ to satisfy tolerance $\epsilon$, we need $k$ iterations where

\[k \geq \frac{\log \frac{\epsilon}{b-a}}{\log r}\]### Gradient Method

To minimize $f$, a update needs a direction and step size. The **gradient method** chooses descent direction $\delta = -\nabla f(x)^T$. For the step size, there are two approaches:

**Exact line search**, which picks step size $\alpha$ via $\min_\alpha f(x + \alpha \delta)$.**Backtracking line search**, an inexact algorithm, which starts with parameters $\alpha, \beta$ and picks the smallest $m$ such taht $f(x + \alpha \beta^m \delta) < f(x)$.

### Newton’s Method

**Newton’s method** is a second-order version of the gradient method. The descent direction is:

The step size follows the same approach.

## Convex Optimization

### Convexity

A *set* $C$ is **convex** if $\forall x, y \in C, t \in [0, 1]$, we have $z = tx + (1-t)y \in C$.

A *function* $f$ is convex if and only if $\forall x, f’‘(x) > 0$. Analogously, for a multivariate function $f$, the function is convex if and only if $\forall x, H_f(x)$ is PSD. A function $f$ is **concave** if $-f$ is convex.

If $f$ is a convex function on a convex set $C$, then any local solution is also a global solution.

Useful principles:

- Linear inequalities are convex (and convex!).
- If $f$ is the sum of convex functions, then $f$ is also convex.
- A linear transformation of a convex set is still convex.
- If $C_1, C_2$ are both convex, then $C_1 \cap C_2$ is convex.

### Convex Optimization

Consider a convex function $f$, a set of linear equalities $h_1(x) = 0, h_2(x) = 0, …, h_m(x) = 0$ and convex inequalities $g_1(x) \leq 0, g_2(x) \leq 0, …, g_l(x) \leq 0$.

Classes of convex optmization:

- LP: With a linear objective and linear inequalities, the problem is a
**linear program**that can be solved by checking every vertex. - QP: If the objective is quadratic and convex with linear constraints, the problem is a
**quadratic program**and the number of possible solution is infinite. - QCQP: With quadratic constraints, the
**quadratically constrained quadratic program**has form $f = x^T p_0 x + q_0^T x + r_0$ and constraints $x^T p_i x + q_i^T x + r_i \leq 0$.

### Constrained Optimization

Consider an objective function $f$ and equalities $h_i$. A point $y$ is considered a **regular point** if all *active* constraints $\nabla h_i(y)$ are linearly independent. If $x^*$ is a local minimum and a regular point, then $\exists \lambda_i$ such that $\nabla f(x^*) + \sum_i \lambda_i \nabla h_i(x^*) = 0$ (stationarity). In practice, often the first step is to verify that no regular points are feasible.

*First Order Necessary Condition*: For a regular point and local min $x^*$, the Karush–Kuhn–Tucker (KKT) conditions imply $\exists \lambda_1, …, \lambda_m, \mu_1, …, \mu_l$ satisfying:

**Stationarity**: $\nabla f(x^*) + \sum_i \lambda_i \nabla h_i(x^*) + \sum_j u_j \nabla g_j(x^*) = 0$**Complementary Slackness**: $\mu_j g_j(x^*) = 0$**Dual Feasibility**: $\mu_j \geq 0$**Primal Feasibility**: $h_i(x^*) = 0$ and $g_j \leq 0$

Usually, we consider combinations of $\mu_j = 0$ and $\mu_j > 0$.

*Second Order Necessary Condition*: Let $M = \nabla^2 f(x^*) + \sum_i \lambda_i \nabla^2 h_i(x^*) + \sum_j u_j \nabla^2 g_j(x^*)$. Then, $\delta x^T M \delta x = H(\delta x) \succcurlyeq 0 \iff E^T M E \succcurlyeq 0$, where $E$ is a basis for the tangent plane at $x^*$. To select $E$,

The second order condition is *sufficient* if $x^*$ is non-degenerate ($\mu_j > 0$) and $M \succ 0$. Intuitively, any unconstrained local optimum will also be a constrained local optimum.

### Duality

For a primal $\min_x f(x)$ with $h_i(x) = 0$ and $g_i(x) \leq 0$, the dual is: $\max \min \mathcal{L}(x, \lambda, \mu) = \max d(\lambda, \mu) \text{ s.t. } \mu \geq 0$

Non-convex problems have a **duality gap** $p^* - d^*$. A convex problem that satisfy Slater’s condition guarantees convex $\mathcal{L}$, $p^* = d^*$, and KKT conditions for $p^*, d^*$. **Slater’s condition** requires a feasible point $y$ that is feasible and strictly satisfies inequalities.

After reformulating the problem in terms of minimizing the Lagrangian, some conditions on $x$ are usually required to prevent a trivial $-\infty$ minimum.

### Pertubations

Pertubations of size $\epsilon$ of a solution $f_*$ are roughly $f_\epsilon \approx f_* - \sum \lambda_i \epsilon _i - \sum \mu_j \epsilon_j$.

### Discrete Optimization

In **discrete optimization**, we consider $x \in \mathbb{Z}$. **Mixed-integer programming** mixes real and integer variable. Relaxing the discrete constraint into a continuous constraint yields a lower bound.

**Branch-and-bound** concieves of discrete optimization as a series of convex optimizations. Relax all constraints and compute the optimum $p$. If $p$ is integer, then $p$ is the optimal solution. Otherwise, pick some $y_i^\text{opt} \not \in \mathbb{Z}$ and branch into $y_i < y_i^{\text{opt}}$ and $y_i > y_i^{\text{opt}}$. Each time we recursively branch, we introduce a new inequality; the solution is the optimon among all leaves.