Power Iteration
The power method for computing the top eigenvector of a symmetric, diagonalizable A starts with an initial vector and repeatedly multiplies by A, eventually causing to converge to the top eigenvector.
1. Basic Algorithm
Suppose is diagonalizable with eigenvalues in a decreasing absolute order (if A is the sample covariance matrix in most case, then all eigenvalues are non-negative). Then, from a random vector , our updating rule is:
The idea behind this algorithm is simple: if is the largest one, then the power iteration will stands out the largest one while other will converge to zero.
There is one obvious potential problems with the power method: What if is near one? Or the relative gap is very small? We address this problem now.
2. Shift-and-Invert Method
For a random start vector, the power method converges in iterations, where
we assume a high-accuracy regime where gap. The dependence on this gap ensures that the largest eigenvalue is significantly amplified in comparison to the remaining values.
If the eigenvalue gap is small, one approach is replace A with a preconditioned matrix – i.e. a matrix with the same top eigenvector but a much larger gap. Specifically, let for some shift parameter . If , we can see that the smallest eigenvector of B (the largest eigenvector of ) is equal to the largest eigenvector of A.
Additionally, if is close to , there will be a constant gap between the largest and second largest values of . For example, if , then we will have
This constant factor gap ensures that the power method applied to converges to the top eigenvector of A in just iterations.
3. Stochastic Variance Reduced Gradient (SVRG)
The shift-and-invert method has a a catch – each iteration of this shifted-and-inverted power method must solve a linear system in B, whose condition number is proportional . For small gap, solving this system via iterative methods is more expensive.
Fortunately, linear system solvers are incredibly well studied and there are many efficient itera- tive algorithms we can adapt to. when is positive semi-definite (i.e. covariance matrix),we introduce a variant of Stochastic Variance Reduced Gradient method.
Typically, stochastic gradient methods are used to optimize convex functions that are given as the sum of many convex components. To solve a linear system
we can minimize the convex function
using some common methods like Newton method.