solidot新版网站常见问题,请点击这里查看。
消息
本文已被查看3674次
Convergence of the ADAM algorithm from a Dynamical System Viewpoint. (arXiv:1810.02263v1 [stat.ML])
来源于:arXiv
Adam is a popular variant of the stochastic gradient descent for finding a
local minimizer of a function. The objective function is unknown but a random
estimate of the current gradient vector is observed at each round of the
algorithm. This paper investigates the dynamical behavior of Adam when the
objective function is non-convex and differentiable. We introduce a
continuous-time version of Adam, under the form of a non-autonomous ordinary
differential equation (ODE). The existence and the uniqueness of the solution
are established, as well as the convergence of the solution towards the
stationary points of the objective function. It is also proved that the
continuous-time system is a relevant approximation of the Adam iterates, in the
sense that the interpolated Adam process converges weakly to the solution to
the ODE. 查看全文>>