Hi WG,
With the development of AIGC, the demand for computing power has increased
significantly. The computing power of a single AIDC, which is limited, can no
longer meet the huge
demand for computing power, giving rise to the scenario of multi-AIDC
collaborative training across WAN. It also expands the scenario application of
fantel.
In order to achieve lossless transmission of RDMA traffic in the WAN, we
propose a draft to introduce the scenario and implemeniaton of credit-based
flow control in WAN.
Comments and discussions are welcome!
Regards,
Jiayuan Hu
Jiayuan Hu (Hugh)
From: internet-drafts
Date: 2025-03-03 09:18
To: [email protected]
Subject: I-D Action: draft-hu-rtgwg-cbfc-rsvp-00.txt
Internet-Draft draft-hu-rtgwg-cbfc-rsvp-00.txt is now available.
Title: Credit-based Flow Control for Cross-AIDC WAN transmission Based on
RSVP
Author: Jiayuan Hu
Name: draft-hu-rtgwg-cbfc-rsvp-00.txt
Pages: 9
Dates: 2025-03-02
Abstract:
This draft defines the Credit-based flow control mechanism for WAN
based on the RSVP protocol. With the increasing demand for AI
computing power, the computing power of a single AIDC can no longer
meet the needs of large model training. This has given rise to
cross-AIDC distributed model training, driving the demand for
transmitting RoCEv2 packets over WAN networks. AI training is
extremely sensitive to network packet loss, and even a small amount
of packet loss may lead to a significant decline in training
efficiency. In addition, the elephant flow and extreme concurrent
traffic also place higher demands on network performance. Credit-
based flow control is a Backpressure-based traffic management
technology, which has high reliability and stability in practical
applications. It can provide high-throughput and zero-packet-loss
transmission guarantees for RoCEv2 traffic, effectively ensuring the
efficiency of cross-data center AI training.
This draft focuses on the scenario where RoCEv2 packets are
transmitted through SRv6 tunnels in the WAN and further expands the
capabilities of the RSVP protocol in WAN. This draft introduces the
Credit-based flow control mechanism into the RSVP protocol to achieve
precise traffic control and provides processing analysis.
The IETF datatracker status page for this Internet-Draft is:
https://datatracker.ietf.org/doc/draft-hu-rtgwg-cbfc-rsvp/
There is also an HTML version available at:
https://www.ietf.org/archive/id/draft-hu-rtgwg-cbfc-rsvp-00.html
Internet-Drafts are also available by rsync at:
rsync.ietf.org::internet-drafts
_______________________________________________
rtgwg mailing list -- [email protected]
To unsubscribe send an email to [email protected]