Hi! Jiong says:
Currently, compiler will lower memcpy function call in XDP/eBPF C program into a sequence of eBPF load/store pairs for some scenarios. Compiler is thinking this "inline" optimiation is beneficial as it could avoid function call and also increase code locality. However, Netronome NPU is not an tranditional load/store architecture that doing a sequence of individual load/store actions are not efficient. This patch set tries to identify the load/store sequences composed of load/store pairs that comes from memcpy lowering, then accelerates them through NPU's Command Push Pull (CPP) instruction. This patch set registered an new optimization pass before doing the actual JIT work, it traverse through eBPF IR, once found candidate sequence then record the memory copy source, destination and length information in the first load instruction starting the sequence and marks all remaining instructions in the sequence into skipable status. Later, when JITing the first load instructoin, optimal instructions will be generated using those record information. For this safety of this transformation: - jump into the middle of the sequence will cancel the optimization. - overlapped memory access will cancel the optimization. - the load destination register still contains the same value as before the transformation. Jakub Kicinski (2): nfp: fix old kdoc issues nfp: bpf: encode indirect commands Jiong Wang (11): nfp: bpf: support backward jump nfp: bpf: record jump destination to simplify jump fixup nfp: bpf: flag jump destination to guide insn combine optimizations nfp: bpf: don't do ld/mask combination if mask is jump destination nfp: bpf: don't do ld/shifts combination if shifts are jump destination nfp: bpf: relax source operands check nfp: bpf: correct the encoding for No-Dest immed nfp: bpf: factor out is_mbpf_load & is_mbpf_store nfp: bpf: implement memory bulk copy for length within 32-bytes nfp: bpf: implement memory bulk copy for length bigger than 32-bytes nfp: bpf: detect load/store sequences lowered from memory copy drivers/net/ethernet/netronome/nfp/bpf/jit.c | 489 ++++++++++++++++++--- drivers/net/ethernet/netronome/nfp/bpf/main.h | 35 +- drivers/net/ethernet/netronome/nfp/bpf/offload.c | 23 +- drivers/net/ethernet/netronome/nfp/bpf/verifier.c | 8 +- drivers/net/ethernet/netronome/nfp/nfp_asm.c | 7 +- drivers/net/ethernet/netronome/nfp/nfp_asm.h | 7 +- drivers/net/ethernet/netronome/nfp/nfp_net.h | 2 + .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c | 9 +- 8 files changed, 505 insertions(+), 75 deletions(-) -- 2.15.0