solidot新版网站常见问题,请点击这里查看。
消息
本文已被查看3500次
Natasha 2: Faster Non-Convex Optimization Than SGD. (arXiv:1708.08694v2 [math.OC] UPDATED)
来源于:arXiv
We design a stochastic algorithm to train any smooth neural network to
$\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$
backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD.
More broadly, it finds $\varepsilon$-approximate local minima of any smooth
nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to
stochastic gradients and Hessian-vector products. 查看全文>>