solidot新版网站常见问题,请点击这里查看。
消息
本文已被查看5731次
Fastest Convergence for Q-learning. (arXiv:1707.03770v2 [cs.SY] UPDATED)
来源于:arXiv
The Zap Q-learning algorithm introduced in this paper is an improvement of
Watkins' original algorithm and recent competitors in several respects. It is a
matrix-gain algorithm designed so that its asymptotic variance is optimal.
Moreover, an ODE analysis suggests that the transient behavior is a close match
to a deterministic Newton-Raphson implementation. This is made possible by a
two time-scale update equation for the matrix gain sequence.
The analysis suggests that the approach will lead to stable and efficient
computation even for non-ideal parameterized settings. Numerical experiments
confirm the quick convergence, even in such non-ideal cases.
A secondary goal of this paper is tutorial. The first half of the paper
contains a survey on reinforcement learning algorithms, with a focus on minimum
variance algorithms. 查看全文>>