On Sat, Feb 21, 2026 at 4:45 AM Leon Hwang <[email protected]> wrote:
>
>
>
> On 2026/2/21 01:50, Alexei Starovoitov wrote:
> > On Fri, Feb 20, 2026 at 7:54 AM Leon Hwang <[email protected]> wrote:
> >>
> >>
> >>
> >> On 2026/2/20 01:47, Alexei Starovoitov wrote:
> >>> On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <[email protected]> wrote:
> >>>>
> >>>> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
> >>>>
> >>>> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
> >>>>
> >>>> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> >>>> X86_FEATURE_BMI1 (TZCNT).
> >>>>
> >>>> bpf_clz64() and bpf_fls64() are supported when the CPU has
> >>>> X86_FEATURE_ABM (LZCNT).
> >>>>
> >>>> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
> >>>>
> >>>> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> >>>> instruction, so it falls back to a regular function call.
> >>>>
> >>>> Signed-off-by: Leon Hwang <[email protected]>
> >>>> ---
> >>>>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 141 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >>>> index 070ba80e39d7..193e1e2d7aa8 100644
> >>>> --- a/arch/x86/net/bpf_jit_comp.c
> >>>> +++ b/arch/x86/net/bpf_jit_comp.c
> >>>> @@ -19,6 +19,7 @@
> >>>>  #include <asm/text-patching.h>
> >>>>  #include <asm/unwind.h>
> >>>>  #include <asm/cfi.h>
> >>>> +#include <asm/cpufeatures.h>
> >>>>
> >>>>  static bool all_callee_regs_used[4] = {true, true, true, true};
> >>>>
> >>>> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void 
> >>>> __percpu *priv_frame_ptr)
> >>>>         *pprog = prog;
> >>>>  }
> >>>>
> >>>> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> >>>> +{
> >>>> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> >>>> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> >>>> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> >>>> +       bool inlined = true;
> >>>> +       u8 *prog = *pprog;
> >>>> +
> >>>> +       /*
> >>>> +        * x86 Bit manipulation instruction set
> >>>> +        * 
> >>>> https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> >>>> +        */
> >>>> +
> >>>> +       if (func == bpf_clz64 && has_abm) {
> >>>> +               /*
> >>>> +                * Intel® 64 and IA-32 Architectures Software 
> >>>> Developer's Manual (June 2023)
> >>>> +                *
> >>>> +                *   LZCNT - Count the Number of Leading Zero Bits
> >>>> +                *
> >>>> +                *     Opcode/Instruction
> >>>> +                *     F3 REX.W 0F BD /r
> >>>> +                *     LZCNT r64, r/m64
> >>>> +                *
> >>>> +                *     Op/En
> >>>> +                *     RVM
> >>>> +                *
> >>>> +                *     64/32-bit Mode
> >>>> +                *     V/N.E.
> >>>> +                *
> >>>> +                *     CPUID Feature Flag
> >>>> +                *     LZCNT
> >>>> +                *
> >>>> +                *     Description
> >>>> +                *     Count the number of leading zero bits in r/m64, 
> >>>> return
> >>>> +                *     result in r64.
> >>>> +                */
> >>>> +               /* emit: x ? 64 - fls64(x) : 64 */
> >>>> +               /* lzcnt rax, rdi */
> >>>> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);
> >>>
> >>> Instead of emitting binary in x86 and arm JITs,
> >>> let's use in kernel disasm to check that all these kfuncs
> >>> conform to kf_fastcall (don't use unnecessary registers,
> >>> don't have calls to other functions) and then copy the binary
> >>> from code and skip the last 'ret' insn.
> >>> This way we can inline all kinds of kfuncs.
> >>>
> >>
> >> Good idea.
> >>
> >> Quick question on “in-kernel disasm”: do you mean adding a kernel
> >> instruction decoder/disassembler to validate a whitelist of kfuncs at
> >> load time?
> >>
> >> I’m trying to understand the intended scope:
> >>
> >> * Is the expectation that we add an in-kernel disassembler/validator for
> >>   a small set of supported instructions and patterns (no calls/jumps,
> >>   only arg/ret regs touched, etc.)?
> >> * Or is there already infrastructure you had in mind that we can reuse?
> >>
> >> Once I understand that piece, I can rework the series to inline by
> >> copying validated machine code (minus the final ret), rather than
> >> emitting raw opcodes in the JITs.
> >>
> >> I also noticed you mentioned a similar direction in "bpf/s390: Implement
> >> get_preempt_count()" [1], so I’ve added Ilya to the thread to discuss
> >> this approach further.
> >
> > You really sound like LLM. Do your homework as a human.
>
> Got it.
>
> I polished my draft using ChatGPT, which would leave LLM smell in my reply.

... and for anyone reading it the smell is ohh too strong.

> Here's my original draft:
>
> Good idea. But I concern about the "in kernel disasm". Do you mean we
> will build a disassembler for whitelist kfuncs at starting?
>
> I noticed you've mentioned the same direction in "bpf/s390: Implement
> get_preempt_count()" [1]. So, I added Ilya here to discuss this direction.

Much better. Keep it human.

"in kernel disasm" already exists for some architectures
(at least x86 and arm64) since it's being used by kprobes.
The ask here is to figure out whether they're usable for such
insn analysis. x86 disasm is likely capable.

re:"whitelist kfunc"
I suspect an additional list is not necessary.
kf_fastcall is a good enough signal that such kfunc should
be inlinable.

Reply via email to