SVRG Algorithm
Let
- Parameters: update frequency
, learning rate - Initialize:
- Iterate: for
- Iterate: for
- Sample
and set
- Sample
- Option I: Set
- Option II: Set
for randomly chosen
Convergence Analysis
We assume all
Variance Reduction Property
We analyze the variance of
which is an unbiased estimator of
On the other hand, for standard SGD, we have
Importantly, when
Convergence Rate
We analyze the convergence of SVRG. Using the update rule, we have
Thus
Summing over
Let
Then
Thus, after
When
Comparing with GD
Let
For GD, we have
Recall that
By properly choosing