Fixing Weight Decay Regularization in Adam. (arXiv:1711.05101v2 [cs.LG] UPDATED) Solidot

文章
往日文章往日投票
皮肤
蓝色橙色绿色浅绿色

关注我们：

solidot新版网站常见问题，请点击这里查看。

消息

本文已被查看5164次

Fixing Weight Decay Regularization in Adam. (arXiv:1711.05101v2 [cs.LG] UPDATED)

来源于:arXiv

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common deep learning frameworks of these algorithms implement L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam, and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperfo 查看全文>>

我爱你，与你无关。--歌德

本站提到的所有注册商标属于他们各自的所有人所有，评论属于其发表者所有，其余内容版权属于 solidot.org(2009-) 所有。

京ICP证161336号京ICP备15039648号-15 北京市公安局海淀分局备案号：11010802021500

举报电话：010-62641205　涉未成年人举报专线：010-62641208 举报邮箱：jubao@zhiding.cn　网上有害信息举报专区：https://www.12377.cn