Coded Elastic Computing. (arXiv:1812.06411v2 [cs.IT] UPDATED)

Cloud providers have recently introduced low-priority machines to reduce the cost of computations. Exploiting such opportunity for machine learning tasks is challenging inasmuch as low-priority machines can elastically leave (through preemption) and join the computation at any time. In this paper, we design a new technique called coded elastic computing enabling distributed machine learning computations over elastic resources. The proposed technique allows machines to transparently leave the computation without sacrificing the algorithm-level performance, and, at the same time, flexibly reduce the workload at existing machines when new machines join the computation. Thanks to the redundancy provided by encoding, our approach is able to achieve similar computational cost as the original (uncoded) method when all machines are present; the cost gracefully increases when machines are preempted and reduces when machines join. We test the performance of the proposed technique on two mini-ben 查看全文>>