# Ken Clarkson’s Talk at FWCG 2015

I was extremely fortunate to attend Ken Clarkson‘s excellent talk on “Input Sparsity and Hardness for Robust Subspace Approximation” at FWCG this year. Excerpts from the talk –

Let’s assume that we are given an input of $n$ points $a_1, a_2, ... , a_n$ in $d$ dimensions. These points are arranged as rows of a matrix $A \in \mathbb{R}^{n * d}$.

The goal is to find a $k$ – dimensional subspace minimizing $cost(A,L) \equiv$ $\sum_{i\in [n]} dist(a_i,L)^2$ among all possible $k$-dimensional subspaces.

The immediate solution is through Singular Value Decomposition(SVD hereon) – there is a best rank-$k$ $A_k = U_k\sum_kV_k^\top$ with $cost(A,L^*) = \|A - A_k\|_F^2 = \sum_{i\in[n]}dist(a_i, a_iV_kV_k^\top)^2$ where $L^*$ is the span of top $k$ right singular vectors in $V_k^\top$. The best known approximation algorithms can find $L'$ such that $dim(L') = k$ and $cost(A, L') \le (1 + \epsilon)cost(A, L^*)$ in $O(nnz(A) + (n + d)poly(k\epsilon^{-1}))$. It is also important to note that SVD primarily functions by reducing the dimensionality of the given set of data points and hence, a single point (or an outlier) can carry a lot of weightage and influence the decisions made in the decomposition. This leads to the conclusion that SVD is not a very robust technique.

The alternative solution is to use $M$-Estimators for Regression – Let $r_i's$ be the residuals i.e the difference between the $i$th observation and the fitted value. The traditional least squares method tries to minimize $\sum_i{r_i^2}$ which runs into trouble when there are outliers in the data as we saw in the earlier case. The $M$-estimators uses $\min_x\sum_iM(r_i)$ to address this issue and are proven to reduce the effect of outliers.

Now that a more robust technique is established, we need to see how it can be applied to low-rank approximation. Low-rank approximation also calls for rotational invariance where we want the cost to be preserved even if $\mathbb{R}^d$ is rotated by a matrix $W$. For example, cost functions of the form $\sum_i{|a_i - b_i|}_1$ can never be used since when B has a zero rank, the rotational invariance property is not preserved for most rotations. The workaround is to use cost functions of the form $\sum_{i \in [n]}M(dist(a_i, L))$ for $M(z) = z^p$.  These functions are rotational invariant and more robust for smaller values of $p$. For any basis $V$ of $L$, we have $cost(A,L) = cost(A,V) \equiv \sum_{i\in [n]} M(dist(a_i, a_i V V^\top))$.  A subspace $L'$ is a $(1 + \epsilon)$ approximation if $dim(L') = k$ and $cost(A, L') <= (1 + \epsilon)min_{dim(L) = k}cost(A, L)$.

Some nice results on this front –

1. Hardness – It is NP-hard find a $O(1 + 1/d)$ approximation for $p \in [1,2)$.
2. For $p \in [1, 2)$, an algorithm exists such that it returns a $(1 + \epsilon)$-approximation in $O(nnz(A) +(n+d)poly(k\epsilon^{-1}) +exp(poly(k\epsilon^{-1})))$ with constant probability.
3. Results for $M$-estimators that are nice –
1. All such functions are even – $M(z) = M(-z)$
2. Monotonic – $|a| > |b| -> M(a) > M(b)$
3. Bounded growth – There is a $C_M$ such that for $a \ge b > 0$, $C_Ma/b \le M(a)/M(b) \le a^2/b^2$.
4. Root sub-additivity property – $\sqrt{M(a + b)} \le \sqrt{M(a)} + \sqrt{M(b)}$.
4. For any vector $x$, $\sqrt{\sum_{i \in [n]}M(x_i)}$ is almost a norm except for a few properties like scale invariance, scale insensitivity, etc.

There are some open questions too –

1. If $\epsilon$ is fixed, what is the hardness of approximation?

He also went onto describe some algorithms for the nice $M$-estimators and row and column reduction techniques which preserve the approximation. Overall, I would say it was a wonderful talk which helped me understand low rank approximations.

Note – I have only sketched a basic idea on the topic here. I strongly recommend the following papers for further reading –

Acknowledgements – I would like to thank Ken Clarkson for his support and encouragement(and slides :)), which played a major role in this blog post.