solidot新版网站常见问题,请点击这里查看。
消息
本文已被查看4637次
Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers. (arXiv:1810.02976v1 [cs.LG])
来源于:arXiv
In this paper, we focus on approaches to parallelizing stochastic gradient
descent (SGD) wherein data is farmed out to a set of workers, the results of
which, after a number of updates, are then combined at a central master node.
Although such synchronized SGD approaches parallelize well in idealized
computing environments, they often fail to realize their promised computational
acceleration in practical settings. One cause is slow workers, termed
stragglers, who can cause the fusion step at the master node to stall, which
greatly slowing convergence. In many straggler mitigation approaches work
completed by these nodes, while only partial, is discarded completely. In this
paper, we propose an approach to parallelizing synchronous SGD that exploits
the work completed by all workers. The central idea is to fix the computation
time of each worker and then to combine distinct contributions of all workers.
We provide a convergence analysis and optimize the combination function. Our
numeric 查看全文>>