Blink a Fast NV Link-Based Collective Communication Library

here for paper

  • Author: Guanhua Wang
  • Distributed Machine Learning, Ring Allreduce, broadcast


1.What This paper talking about:

Ring Allreduce suffer from not fully use the links.
While the newly proposed Blink protocol can fully use the links and further reduce the GST (global synchronization time).
In Detail, it is a hierarchical scheme: given a network topology, we first divide the network into groups within which all nodes are fully connected.

  • In the first phase :
    perform internal broadcast where data is exchanged using all-to-all communication
  • In the second phase :
    perform cross-group forwarding where we communicate across groups and forward cross group data within the respective goups.
    Blink Structure

2. What i learned:

  • Nvidia DGX-1

THE FASTEST PATH TO DEEP LEARNING
Building a platform for deep learning goes well beyond selecting a server and GPUs. A commitment to implementing AI in your business involves carefully selecting and integrating complex software with hardware. NVIDIA® DGX-1™ fast-tracks your initiative with a solution that works right out of the box, so you can gain insights in hours instead of weeks or months.