> On May 31, 2023, at 1:27 AM, Jason Wang <jasow...@redhat.com> wrote: > > On Wed, May 31, 2023 at 11:55 AM Jason Wang <jasow...@redhat.com> wrote: >> >> On Wed, May 31, 2023 at 11:47 AM Jon Kohler <j...@nutanix.com> wrote: >>> >>> >>> >>>> On May 30, 2023, at 11:35 PM, Jason Wang <jasow...@redhat.com> wrote: >>>> >>>> On Wed, May 31, 2023 at 11:32 AM Jason Wang <jasow...@redhat.com> wrote: >>>>> >>>>> On Wed, May 31, 2023 at 11:17 AM Jon Kohler <j...@nutanix.com> wrote: >>>>>> >>>>>> If kernel supports IFF_NAPI, lets use it, which is especially useful >>>>>> on kernels containing fb3f903769e8 ("tun: support NAPI for packets >>>>>> received from batched XDP buffs"), as IFF_NAPI allows the >>>>>> vhost_tx_batch path to use NAPI on XDP buffs. >>>>>> >>>>>> Benchmark w/ iperf -c (remote srv) -P (thread count) -l (stream size) >>>>>> from a guest running kernel 5.10.105 to remote bare metal running >>>>>> patched code on kernel 5.10.139. Guests were configured 1x virtio-net >>>>>> device with 4x queues, resulting in 4x vhost-worker threads. Hosts are >>>>>> identical with Intel ICX 4314 @ 2.4 GHz with Mellanox CX5 25GbE NIC -> >>>>>> Arista 25GbE switch. vhost-worker threads largely maxed out on CPU on >>>>>> "Before" and around ~50-60% utilization "After". >>>>>> >>>>>> Single Stream: iperf -P 1 >>>>>> iperf -l size | Before | After | Increase >>>>>> 64B | 593 Mbits/sec | 712 Mbits/sec | ~20% >>>>>> 128B | 1.00 Gbits/sec | 1.18 Gbits/sec | ~18% >>>>>> 4KB | 17.6 Gbits/sec | 22.7 Gbits/sec | ~29% >>>>>> >>>>>> Multiple Stream: iperf -P 12 >>>>>> iperf -l size | Before | After | Increase >>>>>> 64B | 6.35 Gbits/sec | 7.78 Gbits/sec | ~23% >>>>>> 128B | 10.8 Gbits/sec | 14.2 Gbits/sec | ~31% >>>>>> 4KB | 23.6 Gbits/sec | 23.6 Gbits/sec | (line speed) >>>>>> >>>>>> Signed-off-by: Jon Kohler <j...@nutanix.com> >>>>> >>>>> Great, but I would suggest having an option. >>>>> >>>>> So we can: >>>>> >>>>> 1) ease the debug and compare >>>>> 2) enable this by default only for 8.1, disable it for pre 8.1 >>> >>> Fair enough, one favor to ask though - >>> Would you be able to point me to an existing option like what you’re >>> proposing so I could make sure I’m on the same page? >> >> For example, the vhost option for tap. Maybe we can have an napi option.
OK thanks, I’ll see what I can do there. >> >>> >>>> >>>> More thought, if the performance boost only after fb3f903769e8, we >>>> probably need to disable it by default and let the mgmt layer to >>>> enable it. >>>> >>> >>> I focused my testing with that commit, but I could take it out and >>> we still should get benefit. Would you like me to profile that to validate? >>> >> >> One problem is that NAPI for TAP was originally used for kernel >> hardening. Looking at the history, it introduces a lot of bugs. >> >> Consider: >> >> 1) it has been merged for several years >> 2) tap has been widely used for a long time as well >> >> I think it would be still safe to keep the option off (at least for >> pre 8.1 machines). >> >>> Asking as NAPI support in tun.c has been there for a while, guessing >>> at first glance that there would be non-zero gains, with little downsides. >>> Looking at git blame, seems about ~5-6 years of support. >> >> Yes. >> >>> >>> Also for posterity, that commit has been in since 5.18, so a little over 1 >>> year. >> >> Then I think we can make it enabled by default for 8.1 and see. OK, I’ll see what I can come up with. > > Btw, it would be better if we can have some PPS benchmark as well. > > Thanks Is there a set of specific benchmark(s) that you want to see? Certain packet sizes? TCP/UDP? Certain tool (netperf, iperf, etc)? The existing benchmarks in the commit msg have both single and multi stream and multiple payload sizes, all of which are a corollary to PPS, no? Happy to do more profiling, just want to make sure I get you exactly what you want. > >> >> Thanks >> >>> >>>> Thanks >>>> >>>>> >>>>> Thanks >>>>> >>>>> Thanks >>>>> >>>>>> --- >>>>>> net/tap-linux.c | 4 ++++ >>>>>> net/tap-linux.h | 1 + >>>>>> 2 files changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/net/tap-linux.c b/net/tap-linux.c >>>>>> index f54f308d359..fd94df166e0 100644 >>>>>> --- a/net/tap-linux.c >>>>>> +++ b/net/tap-linux.c >>>>>> @@ -62,6 +62,10 @@ int tap_open(char *ifname, int ifname_size, int >>>>>> *vnet_hdr, >>>>>> ifr.ifr_flags |= IFF_ONE_QUEUE; >>>>>> } >>>>>> >>>>>> + if (features & IFF_NAPI) { >>>>>> + ifr.ifr_flags |= IFF_NAPI; >>>>>> + } >>>>>> + >>>>>> if (*vnet_hdr) { >>>>>> if (features & IFF_VNET_HDR) { >>>>>> *vnet_hdr = 1; >>>>>> diff --git a/net/tap-linux.h b/net/tap-linux.h >>>>>> index bbbb62c2a75..f4d8e55270b 100644 >>>>>> --- a/net/tap-linux.h >>>>>> +++ b/net/tap-linux.h >>>>>> @@ -37,6 +37,7 @@ >>>>>> >>>>>> /* TUNSETIFF ifr flags */ >>>>>> #define IFF_TAP 0x0002 >>>>>> +#define IFF_NAPI 0x0010 >>>>>> #define IFF_NO_PI 0x1000 >>>>>> #define IFF_ONE_QUEUE 0x2000 >>>>>> #define IFF_VNET_HDR 0x4000 >>>>>> -- >>>>>> 2.30.1 (Apple Git-130) >>> >