From: Eric Dumazet <eduma...@google.com> Date: Thu, 20 Aug 2020 08:43:59 -0700
> Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments. > Compared to standard sendmsg(), these skbs usually contain up to 16 fragments > on arches with 4KB page sizes, instead of two. > > This adds considerable costs on various ndo_start_xmit() handlers, > especially when IOMMU is in the picture. > > As high performance applications are often using huge pages, > we can try to combine adjacent pages belonging to same > compound page. > > Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed > is roughly doubled (~55Gbit -> ~100Gbit), when user application > is using hugepages. > > For reference, nominal single TCP flow speed on this platform > without MSG_ZEROCOPY is ~65Gbit. > > Signed-off-by: Eric Dumazet <eduma...@google.com> Applied, the refcounitng in these kinds of patchs is always fun to audit :-)