Current eBPF ISA has 32-bit sub-register and has defined a set of ALU32 instructions.
However, there is no JMP32 instructions, the consequence is code-gen for 32-bit sub-registers is not efficient. For example, explicit sign-extension from 32-bit to 64-bit is needed for signed comparison. Adding JMP32 instruction therefore could complete eBPF ISA on 32-bit sub-register support. This also match those JMP32 instructions in most JIT backends, for example x64-64 and AArch64. These new eBPF JMP32 instructions could have one-to-one map on them. A few verifier ALU32 related bugs has been fixed recently, and JMP32 introduced by this set further improves BPF sub-register ecosystem. Once this is landed, BPF programs using 32-bit sub-register ISA could get reasonably good support from verifier and JIT compilers. Users then could compare the runtime efficiency of one BPF program under both modes, and could use the one benchmarked as better. One good thing is JMP32 is making 32-bit JIT more efficient, because it only has 32-bit use, no def, so unlike ALU32, no need to clear high bits. Hence, even without data-flow analysis, JMP32 is making better code-gen then JMP64. More benchmark results are listed below in this cover letter. - Encoding Ideally, JMP32 could use new CLASS BPF_JMP32, just like BPF_ALU and BPF_ALU32. But we only has one class number 0x06 unused. I am not sure if we want to keep it for other extension purpose. For example restore it as BPF_MISC which could then redefine the interpretation of all the remaining bits in bis[7:1]; So, I am following the coding style used by BPF_PSEUDO_CALL, that is to use reserved bits under BPF_JMP. When BPF_SRC(code) == BPF_X, the encoding is 0x1 at insn->imm. When BPF_SRC(code) == BPF_K, the encoding is 0x1 at insn->src_reg. All other bits in imm and src_reg are still reserved and should be zeroed. - Testing A couple of unit tests has been added and included in this set. Also LLVM code-gen for JMP32 has been added, so you could just compile any BPF C program with both -mcpu=probe and -mattr=+alu32 specified if you are compiling on a machine with kernel patched by this set, LLVM will select the ISA automatically based on host probe results. Otherwise specify -mcpu=v3 and -mattr=+alu32 to force use JMP32 ISA and enable sub-register code-gen. LLVM support could be found at: https://github.com/Netronome/llvm/commit/607f088b92ebfb09f026a84a9443a59237cf6628 (will send out merge request once kernel set reached consensus. Hopefully could get into LLVM 8.0 which will be branched at 16-Jan-2019) I have compiled BPF selftest with JMP32 enabled. The methodology is BPF selftest Makefile has introduced a new variable "BPF_SELFTEST_32BIT" which allows BPF C programs contained inside the testsuite compiled using sub-register mode for which ALU32 and JMP32 instructions will be generated once the kernel installed on the compilation machine support them. From my tests, no regression on this sub-register test mode except when loading bpf_flow.o which somehow verifier doesn't reason the pkt range accurately. test_progs which contains quite a few BPF C tests passed cleanly. Using an env variable to control test mode seems bring smallest change to the Makefile, and would require "make check" with BPF_SELFTEST_32BIT defined in your test driver script for this new test mode. Would appreicate if any better idea on how to enable extra test mode for BPF selftests. - JIT backends support A couple of JIT backends has been supported in this set except SPARC and MIPS which I need maintainer's help on implementing them. @David, @Paul, would appreciate if you could help on this. Also those implemented in this set needs port maintainer's review and tests. I have only tested x86_64 and NFP. - Benchmarking Below are some benchmark results from Cilium BPF programs. After JMP32 enabled, we could see consistently code size reduction and processed instruction numbers are reduced in general as well. Text size in bytes (generated by "size") === LLVM code-gen option default alu32 alu32 + jmp32 change (Vs. alu32) bpf_lb-DLB_L3.o: 6456 6280 6160 -1.91% bpf_lb-DLB_L4.o: 7848 7664 7136 -6.89% bpf_lb-DUNKNOWN.o: 2680 2664 2568 -3.60% bpf_lxc.o: 104824 104744 97360 -7.05% bpf_netdev.o: 23456 23576 21632 -8.25% bpf_overlay.o: 16184 16304 14648 -10.16% Processed insn number === LLVM code-gen option default alu32 alu32 + jmp32 change bpf_lb-DLB_L3.o: 1579 1281 1304 +1.79% bpf_lb-DLB_L4.o: 2045 1663 1554 -6.55% bpf_lb-DUNKNOWN.o: 606 513 505 -1.56% bpf_lxc.o: 85381 103218 102666 -0.53% bpf_netdev.o: 5246 5809 5376 -7.45% bpf_overlay.o: 2443 2705 2460 -9.05% JITed insn num (on NFP, other 32-bit arches could be similar) === LLVM code-gen option default alu32 alu32 + jmp32 change (Vs. alu32) one ~300 line C program 632 612 597 -2.45% (NFP contains some fixed sequence, so the real improvements is higher) Thanks. Cc: David S. Miller <da...@davemloft.net> Cc: Paul Burton <paul.bur...@mips.com> Cc: Wang YanQing <udkni...@gmail.com> Cc: Zi Shen Lim <zlim....@gmail.com> Cc: Shubham Bansal <illusionist....@gmail.com> Cc: Naveen N. Rao <naveen.n....@linux.ibm.com> Cc: Sandipan Das <sandi...@linux.ibm.com> Cc: Martin Schwidefsky <schwidef...@de.ibm.com> Cc: Heiko Carstens <heiko.carst...@de.ibm.com> Jiong Wang (13): bpf: encoding description and macros for JMP32 bpf: interpreter support for JMP32 bpf: JIT blinds support JMP32 x86_64: bpf: implement jitting of JMP32 x32: bpf: implement jitting of JMP32 arm64: bpf: implement jitting of JMP32 arm: bpf: implement jitting of JMP32 ppc: bpf: implement jitting of JMP32 s390: bpf: implement jitting of JMP32 nfp: bpf: implement jitting of JMP32 bpf: verifier support JMP32 bpf: unit tests for JMP32 selftests: bpf: makefile support sub-register code-gen test mode Documentation/networking/filter.txt | 10 + arch/arm/net/bpf_jit_32.c | 23 +- arch/arm64/net/bpf_jit_comp.c | 10 +- arch/powerpc/net/bpf_jit_comp64.c | 50 ++++- arch/s390/net/bpf_jit_comp.c | 12 +- arch/x86/net/bpf_jit_comp.c | 13 +- arch/x86/net/bpf_jit_comp32.c | 46 ++-- drivers/net/ethernet/netronome/nfp/bpf/jit.c | 69 ++++-- include/linux/filter.h | 19 ++ include/uapi/linux/bpf.h | 4 + kernel/bpf/core.c | 60 +++-- kernel/bpf/verifier.c | 178 +++++++++++---- lib/test_bpf.c | 321 ++++++++++++++++++++++++++- tools/include/uapi/linux/bpf.h | 4 + tools/testing/selftests/bpf/Makefile | 4 + 15 files changed, 696 insertions(+), 127 deletions(-) -- 2.7.4