Hi WG, With the development of AIGC, the demand for computing power has increased significantly. The computing power of a single AIDC, which is limited, can no longer meet the huge demand for computing power, giving rise to the scenario of multi-AIDC collaborative training across WAN. It also expands the scenario application of fantel. In order to achieve lossless transmission of RDMA traffic in the WAN, we propose a draft to introduce the scenario and implemeniaton of credit-based flow control in WAN. Comments and discussions are welcome!
Regards, Jiayuan Hu Jiayuan Hu (Hugh) From: internet-drafts Date: 2025-03-03 09:18 To: i-d-annou...@ietf.org Subject: I-D Action: draft-hu-rtgwg-cbfc-rsvp-00.txt Internet-Draft draft-hu-rtgwg-cbfc-rsvp-00.txt is now available. Title: Credit-based Flow Control for Cross-AIDC WAN transmission Based on RSVP Author: Jiayuan Hu Name: draft-hu-rtgwg-cbfc-rsvp-00.txt Pages: 9 Dates: 2025-03-02 Abstract: This draft defines the Credit-based flow control mechanism for WAN based on the RSVP protocol. With the increasing demand for AI computing power, the computing power of a single AIDC can no longer meet the needs of large model training. This has given rise to cross-AIDC distributed model training, driving the demand for transmitting RoCEv2 packets over WAN networks. AI training is extremely sensitive to network packet loss, and even a small amount of packet loss may lead to a significant decline in training efficiency. In addition, the elephant flow and extreme concurrent traffic also place higher demands on network performance. Credit- based flow control is a Backpressure-based traffic management technology, which has high reliability and stability in practical applications. It can provide high-throughput and zero-packet-loss transmission guarantees for RoCEv2 traffic, effectively ensuring the efficiency of cross-data center AI training. This draft focuses on the scenario where RoCEv2 packets are transmitted through SRv6 tunnels in the WAN and further expands the capabilities of the RSVP protocol in WAN. This draft introduces the Credit-based flow control mechanism into the RSVP protocol to achieve precise traffic control and provides processing analysis. The IETF datatracker status page for this Internet-Draft is: https://datatracker.ietf.org/doc/draft-hu-rtgwg-cbfc-rsvp/ There is also an HTML version available at: https://www.ietf.org/archive/id/draft-hu-rtgwg-cbfc-rsvp-00.html Internet-Drafts are also available by rsync at: rsync.ietf.org::internet-drafts
_______________________________________________ rtgwg mailing list -- rtgwg@ietf.org To unsubscribe send an email to rtgwg-le...@ietf.org