Fast projection onto a Simplex (Python)

(work in progress)

Many convex optimization problems can be solved, fairly simply, if the projection onto the constraints can be quickly and simply calculated. By projection onto the constraints \mathcal{C}, we mean the solution to the following optimization problem:

(1)   \begin{align*} \begin{array}{lll} P_\mathcal{C}(y) = & \displaystyle \arg \min_{x} & \frac{1}{2}\left\|x-y \right\|^2, \\ & \textrm{s.t.} & x \in \mathcal{C}. \end{array} \end{align*}

As an example, the subgradient descent method can incorporate the projection operator to deal with constraints. Here we focus on projection on a simplex: x^\top a = 1, x >= 0. The objective function that we wish to solve is:

(2)   \begin{align*} \begin{array}{lll} P_\mathcal{C}(y) = & \displaystyle \arg \min_{x} & \frac{1}{2}\left\|x-y \right\|^2, \\ & \textrm{s.t.} & x \succeq 0, \\ & & a^\top x = 1. \end{array} \end{align*}

As a test case, we can implement the projection in cvxpy:

import cvxpy as cvx
import numpy as np

def proj_simplex_cvxpy(a, y):
    '''
    Returns the point in the simplex a^Tx = 1, x>=0 that is
     closest to y (according to Euclidian distance)
    '''
    d = len(a)

    # setup the objective and constraints and solve the problem
    x = cvx.Variable(d)
    obj = cvx.Minimize(cvx.sum_squares(x - y))
    constr = [x >= 0, a*x == 1]
    prob = cvx.Problem(obj, constr)
    prob.solve()

    return np.array(x.value).squeeze()

We can write the dual as:

    \begin{align*} L(\lambda) & = \min_{x} \sum_{i=1}^{n}\left[\frac{1}{2}(x_i - y_i)^2 + \lambda a_i x_i \right] -\lambda\qquad \textrm{s.t.} \quad x \succeq 0, \\ & =\min_{x} \sum_{i=1}^{n}\left[\frac{1}{2}x_i^2 - x_i[y_i - \lambda a_i]- \frac{1}{2}y_i^2\right] - \lambda \qquad \textrm{s.t.} \quad x \succeq 0. \end{align*}

Since the x_i decouples (for fixed \lambda) it can be solved as:

    \begin{align*} x_i = \max\left(0, y_i-\lambda a_i \right). \end{align*}

The dual objective (which is to be maximized) is therefore:

    \begin{align*} L(\lambda) & = -\sum_{i=1}^n \frac{1}{2}\max(0, y_i - \lambda a_i)^2 - \frac{1}{2}y_i^2 - \lambda \end{align*}

An example of the dual function is given below (source code is here):

daulexample

As \lambda is decreased from a large value, the values of x_i will “switch on” from 0. To determine where the variables will switch on we sort the values according to y_i/a_i.
Consider that we have an index \mathcal{I} that sorts according to y_i/a_i. In other words:

    \begin{align*} y_{I_1}/a_{I_1} \leq  y_{I_2}/a_{I_2} \leq  \ldots \leq  y_{I_n}/a_{I_n}. \end{align*}

We can refer to these as our indexed \lambda values,

    \begin{align*} \lambda_i = \frac{y_{\cI_i}}{a_{\cI_i}}. \end{align*}

Suppose that we have a \lambda value. We then want to find an index k, so that

    \begin{align*} \lambda_{k-1} < \lambda \leq \lambda_k. \end{align*}

We can then write the objective function as:

    \begin{align*} L(\lambda) & = -\sum_{i=k}^{n} \frac{1}{2}(y_{\cI_i} - \lambda a_{\cI_i})^2 - \lambda + \frac{1}{2}\sum_{j=1}^n y_j^2. \end{align*}

The derivative of the above is therefore:

    \begin{align*} \partial L(\lambda) & = \sum_{i=k}^n a_{\cI_i}(y_{\cI_i} - \lambda a_{\cI_i}) - 1.\\ \end{align*}

By searching on the sorted list, we get an index k, such that \partial L(\lambda_k) \leq 0, but \partial L(\lambda_{k-1}) > 0.
The optimal lambda value can then be calculated as:

    \begin{align*} \lambda = \frac{\sum_{i=k}^n a_{\cI_i}y_{\cI_i} + 1}{\sum_{i=k}^n a_{\cI_i}}. \end{align*}

The final algorithm is:

def proj(a, y):

    l = y/a
    idx = np.argsort(l)
    d = len(l)

    evalpL = lambda k: np.sum(a[idx[k:]]*(y[idx[k:]] - l[idx[k]]*a[idx[k:]]) ) -1


    def bisectsearch():
        idxL, idxH = 0, d-1
        L = evalpL(idxL)
        H = evalpL(idxH)

        if L<0:
            return idxL

        while (idxH-idxL)>1:
            iMid = int((idxL+idxH)/2)
            M = evalpL(iMid)

            if M>0:
                idxL, L = iMid, M
            else:
                idxH, H = iMid, M

        return idxH

    k = bisectsearch()
    lam = (np.sum(a[idx[k:]]*y[idx[k:]])-1)/np.sum(a[idx[k:]])

    x = np.maximum(0, y-lam*a)

    return x

Other resources:

  • MATLAB implementation of projection onto unit simplex (description here)
  • Michelot, C. (1986). A finite algorithm for finding the projection of a point onto the canonical simplex of∝ n. Journal of Optimization Theory and Applications, 50(1), 195-200.

 

Todo:

  1. Implement the code in C/C++
  2. Give a better test case
  3. Simplify the explanation and implementation
  4. Give a practical example (e.g. estimation problem with subgradient descent)

You may also like