On Thu, Feb 19, 2026 at 6:30 AM Leon Hwang <[email protected]> wrote:
>
> Implement JIT inlining of the 64-bit bitops kfuncs on x86_64.
>
> bpf_rol64() and bpf_ror64() are always supported via ROL/ROR.
>
> bpf_ctz64() and bpf_ffs64() are supported when the CPU has
> X86_FEATURE_BMI1 (TZCNT).
>
> bpf_clz64() and bpf_fls64() are supported when the CPU has
> X86_FEATURE_ABM (LZCNT).
>
> bpf_popcnt64() is supported when the CPU has X86_FEATURE_POPCNT.
>
> bpf_bitrev64() is not inlined as x86_64 has no native bit-reverse
> instruction, so it falls back to a regular function call.
>
> Signed-off-by: Leon Hwang <[email protected]>
> ---
>  arch/x86/net/bpf_jit_comp.c | 141 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 141 insertions(+)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 070ba80e39d7..193e1e2d7aa8 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -19,6 +19,7 @@
>  #include <asm/text-patching.h>
>  #include <asm/unwind.h>
>  #include <asm/cfi.h>
> +#include <asm/cpufeatures.h>
>
>  static bool all_callee_regs_used[4] = {true, true, true, true};
>
> @@ -1604,6 +1605,127 @@ static void emit_priv_frame_ptr(u8 **pprog, void 
> __percpu *priv_frame_ptr)
>         *pprog = prog;
>  }
>
> +static bool bpf_inlines_func_call(u8 **pprog, void *func)
> +{
> +       bool has_popcnt = boot_cpu_has(X86_FEATURE_POPCNT);
> +       bool has_bmi1 = boot_cpu_has(X86_FEATURE_BMI1);
> +       bool has_abm = boot_cpu_has(X86_FEATURE_ABM);
> +       bool inlined = true;
> +       u8 *prog = *pprog;
> +
> +       /*
> +        * x86 Bit manipulation instruction set
> +        * https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
> +        */
> +
> +       if (func == bpf_clz64 && has_abm) {
> +               /*
> +                * Intel® 64 and IA-32 Architectures Software Developer's 
> Manual (June 2023)
> +                *
> +                *   LZCNT - Count the Number of Leading Zero Bits
> +                *
> +                *     Opcode/Instruction
> +                *     F3 REX.W 0F BD /r
> +                *     LZCNT r64, r/m64
> +                *
> +                *     Op/En
> +                *     RVM
> +                *
> +                *     64/32-bit Mode
> +                *     V/N.E.
> +                *
> +                *     CPUID Feature Flag
> +                *     LZCNT
> +                *
> +                *     Description
> +                *     Count the number of leading zero bits in r/m64, return
> +                *     result in r64.
> +                */
> +               /* emit: x ? 64 - fls64(x) : 64 */
> +               /* lzcnt rax, rdi */
> +               EMIT5(0xF3, 0x48, 0x0F, 0xBD, 0xC7);

Instead of emitting binary in x86 and arm JITs,
let's use in kernel disasm to check that all these kfuncs
conform to kf_fastcall (don't use unnecessary registers,
don't have calls to other functions) and then copy the binary
from code and skip the last 'ret' insn.
This way we can inline all kinds of kfuncs.

pw-bot: cr

Reply via email to