From: Willem de Bruijn <will...@google.com> UDP segmentation offload with UDP_SEGMENT can significantly reduce the transmission cycle cost per byte for protocols like QUIC.
Pacing offload with SO_TXTIME can improve accuracy and cycle cost of pacing for such userspace protocols further. But the maximum GSO size built is limited by the pacing rate. As msec pacing interval, for many Internet clients results in at most a few segments per datagram. The pros and cons were captured in a recent CloudFlare article, specifically mentioning "But it does not yet support specifying different times for each packet when GSO is used, as there is no way to define multiple timestamps for packets that need to be segmented (each segmented packet essentially ends up being sent at the same time anyway)." https://blog.cloudflare.com/accelerating-udp-packet-transmission-for-quic/ We have been evaluating such a mechanism for multiple release times per UDP GSO packets. Since it sounds like it may of interest to others, too, it may be a while before we have all the data I'd like and it's more quiet on the list now that the merge window is open, sharing a WIP version. The basic approach is to specify 1. initial early release time (in nsec) 2. interval between subsequent release times (in msec) 3. number of segments to release at each release time One implementation concern is where to store the additional two fields in the skb. Given that msec granularity is the Internet pacing speed, for now repurpose the two lowest 4B nibbles in skb->tstamp to hold the interval and segment count. I'm aware that this does not win a prize for elegance. Patch 1 adds the socket option and basic segmentation function to adjust the skb->tstamp of the individual segments. Patch 2 extends this with support for build GSO segs. Build one GSO segment per interval if the hardware can offload (USO) and thus we are segmenting only to maintain pacing rate. Patch 3 wires the segmentation up to the FQ qdisc on enqueue, so that segments will be scheduled for delivery at their adjusted time. Patch 4..6 extend existing tests to experiment with the feature Patch 4 allows testing so_txtime across hardware (for USO) Patch 5 extends the so_txtime test with support for gso and mr-pacing Patch 6 extends the udpgso bench to support pacing and mr-pacing Some known limitations: - the aforementioned storage in skb->tstamp. - exposing this constraint through the SO_TXTIME interface. it is cleaner to add new fields to the cmsg, at nsec resolution. - the fq_enqueue path adds a branch to the hot path. a static branch would avoid that. - a few udp specific assumptions in a net/core datapath. notably the hw_features. this can be derived from gso_type. Willem de Bruijn (6): net: multiple release time SO_TXTIME net: build gso segs in multi release time SO_TXTIME net_sched: sch_fq: multiple release time support selftests/net: so_txtime: support txonly/rxonly modes selftests/net: so_txtime: add gso and multi release pacing selftests/net: upgso bench: add pacing with SO_TXTIME include/linux/netdevice.h | 1 + include/net/sock.h | 3 +- include/uapi/linux/net_tstamp.h | 3 +- net/core/dev.c | 71 +++++++++ net/core/sock.c | 4 + net/sched/sch_fq.c | 33 ++++- tools/testing/selftests/net/so_txtime.c | 136 ++++++++++++++---- tools/testing/selftests/net/so_txtime.sh | 7 + .../testing/selftests/net/so_txtime_multi.sh | 68 +++++++++ .../selftests/net/udpgso_bench_multi.sh | 65 +++++++++ tools/testing/selftests/net/udpgso_bench_tx.c | 72 +++++++++- 11 files changed, 431 insertions(+), 32 deletions(-) create mode 100755 tools/testing/selftests/net/so_txtime_multi.sh create mode 100755 tools/testing/selftests/net/udpgso_bench_multi.sh -- 2.27.0.278.ge193c7cf3a9-goog