This patchset change ndo_xdp_xmit API to take a bulk of xdp frames. In this V4 patchset, I've split-out the patches from 4 to 8 patches. I cannot split the driver changes from the NDO change, but I've tried to isolated the NDO change together with the driver change as much as possible.
When kernel is compiled with CONFIG_RETPOLINE, every indirect function pointer (branch) call hurts performance. For XDP this have a huge negative performance impact. This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but also prepares for further optimizations. The DMA APIs use of indirect function pointer calls is the primary source the regression. It is left for a followup patchset, to use bulking calls towards the DMA API (via the scatter-gatter calls). The other advantage of this API change is that drivers can easier amortize the cost of any sync/locking scheme, over the bulk of packets. The assumption of the current API is that the driver implemementing the NDO will also allocate a dedicated XDP TX queue for every CPU in the system. Which is not always possible or practical to configure. E.g. ixgbe cannot load an XDP program on a machine with more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net is hard to configure as it requires manually increasing the queues. E.g. tun driver chooses to use a per XDP frame producer lock modulo smp_processor_id over avail queues. I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of this patchset. This will be a followup patchset, once we know if this will be needed (e.g. for non-map xdp_redirect flush-flag, and if AF_XDP chooses to use ndo_xdp_xmit for TX). --- Jesper Dangaard Brouer (8): bpf: devmap introduce dev_map_enqueue bpf: devmap prepare xdp frames for bulking xdp: add tracepoint for devmap like cpumap have samples/bpf: xdp_monitor use tracepoint xdp:xdp_devmap_xmit xdp: introduce xdp_return_frame_rx_napi xdp: change ndo_xdp_xmit API to support bulking xdp/trace: extend tracepoint in devmap with an err samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 ++++- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++- drivers/net/tun.c | 37 ++++--- drivers/net/virtio_net.c | 66 +++++++++--- include/linux/bpf.h | 16 ++- include/linux/netdevice.h | 14 ++- include/net/page_pool.h | 5 + include/net/xdp.h | 1 include/trace/events/xdp.h | 50 +++++++++ kernel/bpf/cpumap.c | 2 kernel/bpf/devmap.c | 134 ++++++++++++++++++++++++- net/core/filter.c | 23 +--- net/core/xdp.c | 20 +++- samples/bpf/xdp_monitor_kern.c | 49 +++++++++ samples/bpf/xdp_monitor_user.c | 69 +++++++++++++ 16 files changed, 449 insertions(+), 86 deletions(-) --