HiPS Hierarchical Parameter Synchronization in Large-Scale DML

Author: Jinkun Geng,Li Dan
Distributed Machine Learning, Parameter Server, Ring Allreduce, Mesh-based Synchronization(MS)

1.What This paper talking about:

DML system PS MS RS all work in flat way and suffer when the network size is large.

In this work, we propose HiPS, a hierarchical parameter(gradient) synchronization framework.

Deploy it on server-centric network as BCube and Torus and this better
embrace RDMA/RoCE transport between machines.

By Mathmatical calculation, we show that it reduces the synchronization time by 73% and 75% respectively.

2.What i learn:

Data-center network Topology

Server-centric network topology VS switch-centric network topology. First one better for RDMA. And avoid PFC problem.
server-centric network contains:

BCube(更常用)

Torus

DCell
switch-centric network:

Clos Fat-Tree

State-of-art of synchronization algorithm:

Parmeter Server (PS)

Push and Pull

Mesh-based (MS)

Diffuse and Collect
mesh-based is a special situation of the parameter server when workers equals the servers.
And get better load-balance among servers.

Ring-based (RS)

the total number of servers : N

the number of servers in each group: n = N^(1/h)

the number of hierarchical stages in HiPS : h

the total size of gradient parameters on each server : W

the bandwidth capacity of each NIC : B

=> Then the time of GST(global synchronization time) of RS = 2 (N-1) W / B N

Synchronous Parallel Model:

Since the computation performance and communication latency of servers (computing nodes)vary in large-sacle DML .
Bulk Synchronous Parallel Model(BSP)

分布式机器学习包括四种同步机制：

包括BSP(Bulk Synchronous Parallel)批量同步并行、

Async（Asynchronous Parallel）全异步并行、

SSP（Stale Synchronous Parallel）延迟同步并行、

ASP大概同步并行。