On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <[email protected]> wrote: > > > > On 2026/2/20 01:47, Alexei Starovoitov wrote: > > On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <[email protected]> wrote: > >> > >> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64. > >> > >> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR. > >> > >> bpf_ctz64() and bpf_ffs64() are supported when the CPU has > >> X86_FEATURE_BMI1 (TZCNT). > >> > >> bpf_clz64() and bpf_fls64() are supported when the CPU has > >> X86_FEATURE_ABM (LZCNT). > >> > >> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT. > >> > >> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse > >> instruction, so it falls back to a regular function call. > >> > >> Signed-off-by: Leon Hwang <[email protected]> > >> --- > >> arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 141 insertions(+) > >> > >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > >> index 070ba80e39d7..193e1e2d7aa8 100644 > >> --- a/arch/x86/net/bpf_jit_comp.c > >> +++ b/arch/x86/net/bpf_jit_comp.c > >> @@ -19,6 +19,7 @@ > >> #include <asm/text-patching.h> > >> #include <asm/unwind.h> > >> #include <asm/cfi.h> > >> +#include <asm/cpufeatures.h> > >> > >> static bool all_callee_regs_used[4] = {true, true, true, true}; > >> > >> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void > >> __percpu *priv_frame_ptr) > >> *pprog = prog; > >> } > >> > >> +static bool bpf_inlines_func_call(u8 **pprog, void *func) > >> +{ > >> + bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT); > >> + bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1); > >> + bool has_abm = boot_cpu_has(X86_FEATURE_ABM); > >> + bool inlined = true; > >> + u8 *prog = *pprog; > >> + > >> + /* > >> + * x86 Bit manipulation instruction set > >> + * > >> https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set > >> + */ > >> + > >> + if (func == bpf_clz64 && has_abm) { > >> + /* > >> + * Intel® 64 and IA-32 Architectures Software Developer's > >> Manual (June 2023) > >> + * > >> + * LZCNT - Count the Number of Leading Zero Bits > >> + * > >> + * Opcode/Instruction > >> + * F3 REX.W 0F BD /r > >> + * LZCNT r64, r/m64 > >> + * > >> + * Op/En > >> + * RVM > >> + * > >> + * 64/32-bit Mode > >> + * V/N.E. > >> + * > >> + * CPUID Feature Flag > >> + * LZCNT > >> + * > >> + * Description > >> + * Count the number of leading zero bits in r/m64, > >> return > >> + * result in r64. > >> + */ > >> + /* emit: x ? 64 - fls64(x) : 64 */ > >> + /* lzcnt rax, rdi */ > >> + EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7); > > > > Instead of emitting binary in x86 and arm JITs, > > let's use in kernel disasm to check that all these kfuncs > > conform to kf_fastcall (don't use unnecessary registers, > > don't have calls to other functions) and then copy the binary > > from code and skip the last 'ret' insn. > > This way we can inline all kinds of kfuncs. > > > > Good idea. > > Quick question on “in-kernel disasm”: do you mean adding a kernel > instruction decoder/disassembler to validate a whitelist of kfuncs at > load time? > > I’m trying to understand the intended scope: > > * Is the expectation that we add an in-kernel disassembler/validator for > a small set of supported instructions and patterns (no calls/jumps, > only arg/ret regs touched, etc.)? > * Or is there already infrastructure you had in mind that we can reuse? > > Once I understand that piece, I can rework the series to inline by > copying validated machine code (minus the final ret), rather than > emitting raw opcodes in the JITs. > > I also noticed you mentioned a similar direction in "bpf/s390: Implement > get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss > this approach further.
You really sound like LLM. Do your homework as a human.

