HiPS Hierarchical Parameter Synchronization in Large-Scale DML

here for paper

  • Author: Jinkun Geng,Li Dan
  • Distributed Machine Learning, Parameter Server, Ring Allreduce, Mesh-based Synchronization(MS)


1.What This paper talking about:

DML system PS MS RS all work in flat way and suffer when the network size is large.

In this work, we propose HiPS, a hierarchical parameter(gradient) synchronization framework.

Deploy it on server-centric network as BCube and Torus and this better
embrace RDMA/RoCE transport between machines.

By Mathmatical calculation, we show that it reduces the synchronization time by 73% and 75% respectively.


2.What i learn:

Data-center network Topology

Server-centric network topology VS switch-centric network topology. First one better for RDMA. And avoid PFC problem.
server-centric network contains:

  • BCube(更常用)
  • Torus
  • DCell
    switch-centric network:
  • Clos Fat-Tree

State-of-art of synchronization algorithm:

Parmeter Server (PS)
PS
Push and Pull

Mesh-based (MS)
MS
Diffuse and Collect
mesh-based is a special situation of the parameter server when workers equals the servers.
And get better load-balance among servers.

Ring-based (RS)
RS

  • the total number of servers : N
  • the number of servers in each group: n = N^(1/h)
  • the number of hierarchical stages in HiPS : h
  • the total size of gradient parameters on each server : W
  • the bandwidth capacity of each NIC : B

=> Then the time of GST(global synchronization time) of RS = 2 (N-1) W / B N


Synchronous Parallel Model:

Since the computation performance and communication latency of servers (computing nodes)vary in large-sacle DML .
Bulk Synchronous Parallel Model(BSP)

分布式机器学习包括四种同步机制:

包括BSP(Bulk Synchronous Parallel)批量同步并行、

Async(Asynchronous Parallel)全异步并行、

SSP(Stale Synchronous Parallel)延迟同步并行、

ASP大概同步并行。