Hi Abhishek,

I have tested this patch series on PowerPC64 (ppc64le) and can confirm all 8 patches work correctly.

## Test Environment
- System: PowerPC64 (ppc64le)
- Kernel: 7.1.0-rc3 with all 8 patches applied
- BTF Support: Enabled (CONFIG_DEBUG_INFO_BTF=y)
- Test Suite: tools/testing/selftests/bpf/test_progs

## Test Results

### WITHOUT Patches (Baseline):
- Kernel: 7.1.0-rc3-00362-g6916d5703ddf
- Summary: 111/2135 PASSED, 35 SKIPPED, 5 FAILED
- verifier_tailcall_jit: **FAIL**
- Error: "Can't disasm instruction at offset 96: 60 80 1c 00 00 00 00 c0"

### WITH All 8 Patches:
- Kernel: 7.1.0-rc3-00370-g202ecf282b6e
- Summary: 112/2136 PASSED, 35 SKIPPED, 4 FAILED
- verifier_tailcall_jit: **OK**

The critical PowerPC64-specific test `verifier_tailcall_jit` now passes:

#635/1   verifier_tailcall_jit/main:OK
#635     verifier_tailcall_jit:OK


This test was previously failing due to JIT disassembly issues caused by the dummy_tramp_addr placement. The patches successfully:

1. Fix alignment of long branch trampoline address
2. Relocate dummy_tramp_addr to bottom of stub
3. Fix JIT disassembly truncation
4. Enable PowerPC64 arch support in verifier tests
5. Fix compare instruction (cmpldi vs cmplwi) for tailcalls
6. Add PowerPC64 tailcall verifier test
7. Fix JIT buffer overflow for large programs
8. Fix percpu private stack leak on JIT failure

The 4 remaining failures are all in verifier_arena* tests with -ENOMEM errors, which are unrelated to these patches and appear to be platform-specific memory allocation issues with the arena feature.

Please add below tag.
Tested-by: Yeswanth Krishna Tellakula <[email protected]>

Regards,
Yeswanth

On 24/06/26 4:44 am, [email protected] wrote:
From: Abhishek Dubey <[email protected]>

The verifier selftest validates JITed instructions by matching expected
disassembly output. The first two patches fix issues in powerpc instruction
disassembly that were causing test flow failures. The fix is common for
64-bit & 32-bit powerpc. Add support for the powerpc-specific "__powerpc64"
architecture tag in the third patch, enabling proper test filtering in
verifier test files. Introduce verifier testcases for tailcalls on powerpc64.

The first patch in series is fix patch, correcting memory alignment with
8-byte boundary for long branch address field. The subsequent patches
enables verifier selftests on powerpc. The fifth patch in the series fixes
incorrect comparator usage for comparing tailcall info with tailcall
threshold. The last two patches fix JIT buffer overflow for large BPF progs
and private stack memory leak (identified by bot during reviews).

Issue Details:
--------------

     The Long branch stub in the trampoline implementation[1] provides
     flexibility to handles short as well as long branch distance to
     actual trampoline. Whereas, the 8 bytes long dummy_tramp_addr field
     sitting before long branch stub leads to failure when enabling
     verifier based seltest for ppc64.
The verifier selftests require disassembing the final jited image
     to get native instructions. Later the disassembled instruction
     sequence is matched against sequence of instructions provided in
     test-file under __jited() wrapper. The final jited image contains
     Out-of-line stub and Long branch stub as part of epilogue jitting
     for a bpf program. The 8 bytes space for dummy_tramp is sandwiched
     between both above mentioned stubs. These 8 bytes contain memory
     address of dummy trampoline during trampoline invocation which don't
     correspond to any powerpc instructions. So, disassembly fails
     resulting in failure of verifier selftests.
The following code snippet shows the problem with current arrangement
     made for dummy_tramp_addr.
/* Out-of-line stub */
     mflr    r0
     [b|bl]  tramp
     mtlr    r0 //only with OOL
     b       bpf_func + 4
     /* Long branch stub */
     .long   <dummy_tramp_addr>  <---Invalid bytes sequence, disassembly fails
     mflr    r11
     bcl     20,31,$+4
     mflr    r12
     ld      r12, -8-SZL(r12)
     mtctr   r12
     mtlr    r11 //retain ftrace ABI
     bctr

     Consider test program binary of size 112 bytes:
     0:  00000060 10004de8 00002039 f8ff21f9 81ff21f8 7000e1fb 3000e13b
     28: 3000e13b 2a006038 f8ff7ff8 00000039 7000e1eb 80002138 7843037d
     56: 2000804e a602087c 00000060 a603087c bcffff4b c0341d00 000000c0
     84: a602687d 05009f42 a602887d f0ff8ce9 a603897d a603687d 2004804e

     Disassembly output of above binary for ppc64le:
     pc:0     left:112    00 00 00 60  :  nop
     pc:4     left:108    10 00 4d e8  :  ld 2, 16(13)
     pc:8     left:104    00 00 20 39  :  li 9, 0
     pc:12    left:100    f8 ff 21 f9  :  std 9, -8(1)
     pc:16    left:96     81 ff 21 f8  :  stdu 1, -128(1)
     pc:20    left:92     70 00 e1 fb  :  std 31, 112(1)
     pc:24    left:88     30 00 e1 3b  :  addi 31, 1, 48
     pc:28    left:84     30 00 e1 3b  :  addi 31, 1, 48
     pc:32    left:80     2a 00 60 38  :  li 3, 42
     pc:36    left:76     f8 ff 7f f8  :  std 3, -8(31)
     pc:40    left:72     00 00 00 39  :  li 8, 0
     pc:44    left:68     70 00 e1 eb  :  ld 31, 112(1)
     pc:48    left:64     80 00 21 38  :  addi 1, 1, 128
     pc:52    left:60     78 43 03 7d  :  mr    3, 8
     pc:56    left:56     20 00 80 4e  :  blr
     pc:60    left:52     a6 02 08 7c  :  mflr 0
     pc:64    left:48     00 00 00 60  :  nop
     pc:68    left:44     a6 03 08 7c  :  mtlr 0
     pc:72    left:40     bc ff ff 4b  :  b .-68
     pc:76    left:36     c0 34 1d 00  :
     ...

     Failure log:
     Can't disasm instruction at offset 76: c0 34 1d 00 00 00 00 c0 a6 02 68 7d 
05 00 9f 42
     --------------------------------------

     Observation:
     Can't disasm instruction at offset 76 as this address has
     ".long <dummy_tramp_addr>" (0xc0341d00000000c0)
     But valid instructions follow at offset 84 onwards.

     Move the long branch address space to the bottom of the long
     branch stub. This allows uninterrupted disassembly until the
     last 8 bytes. Exclude these last bytes from the overall
     program length to prevent failure in assembly generation.

     Following is disassembler output for same test program with moved down
     dummy_tramp_addr field:
     .....
     .....
     pc:68    left:44     a6 03 08 7c  :  mtlr 0
     pc:72    left:40     bc ff ff 4b  :  b .-68
     pc:76    left:36     a6 02 68 7d  :  mflr 11
     pc:80    left:32     05 00 9f 42  :  bcl 20, 31, .+4
     pc:84    left:28     a6 02 88 7d  :  mflr 12
     pc:88    left:24     14 00 8c e9  :  ld 12, 20(12)
     pc:92    left:20     a6 03 89 7d  :  mtctr 12
     pc:96    left:16     a6 03 68 7d  :  mtlr 11
     pc:100   left:12     20 04 80 4e  :  bctr
     pc:104   left:8      c0 34 1d 00  :

     Failure log:
     Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
     ---------------------------------------
     Disassembly logic can truncate at 104, ignoring last 8 bytes.

     Update the dummy_tramp_addr field offset calculation from the end
     of the program to reflect its new location, for bpf_arch_text_poke()
     to update the actual trampoline's address in this field.

     [1] 
https://lore.kernel.org/all/[email protected]

v8->v9:
   Dynamic pass handling until code keeps shrinking
   Fix private stack memory leak

v7->v8:
   Fixed bot identified issues of alt_exit_addr and BPF_EXIT
   Fixed 32-bit ppc function signature mismatch

v6->v7:
   Fixed JIT buffer overflow in case of large BPF progs
   Addressed remaining bot comments

v5->v6:
   Changed alignment NOP emittion dependency on fimage layout
   Adjust tail truncate length for 32-bit ppc
   Addressed few minor bot comments

v4->v5:
   Handled alignment NOP emit logic and corresponding stub offsets
   Handled image buffer overflow problem in last pass
   Above changes took care of other bot reviews
   Included LLVMDisposeMessage() for graceful freeing
   Adjusted parameters in bpf_jit_build_fentry_stubs for ppc32
   Adjusted expected JIT inst. in tailcall test for
CONFIG_PPC_KERNEL_PCREL config
   Added fix patch at last for inaccurate use of cmplwi inst.

v3->v4:
   Changed logic for emitting alignment NOP

v2->v3:
   Removed fixed NOP from bottom of long branch stub
   Rebased on top of bpf-next

v1->v2:
   Added fix-patch to correct memory alignment in-place
   Moved the optional alignmnet NOP before OOL stub

[v1]: https://lore.kernel.org/bpf/[email protected]
[v2]: https://lore.kernel.org/bpf/[email protected]
[v3]: https://lore.kernel.org/bpf/[email protected]
[v4]: https://lore.kernel.org/bpf/[email protected]
[v5]: https://lore.kernel.org/bpf/[email protected]
[v6]: https://lore.kernel.org/bpf/[email protected]
[v7]: https://lore.kernel.org/bpf/[email protected]
[v8]: https://lore.kernel.org/bpf/[email protected]

Abhishek Dubey (8):
   powerpc/bpf: fix alignment of long branch trampoline address
   powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
   selftest/bpf: Fixing powerpc JIT disassembly failure
   selftest/bpf: Enable verifier selftest for powerpc64
   powerpc64/bpf: fix compare instruction emitted for tailcall
   selftest/bpf: Add tailcall verifier selftest for powerpc64
   powerpc/bpf: fix buffer overflow in JIT for large BPF programs
   powerpc64/bpf: fix percpu private stack leak on JIT failure

  arch/powerpc/net/bpf_jit.h                    | 20 +++-
  arch/powerpc/net/bpf_jit_comp.c               | 99 ++++++++++++++-----
  arch/powerpc/net/bpf_jit_comp32.c             |  7 +-
  arch/powerpc/net/bpf_jit_comp64.c             | 15 +--
  .../selftests/bpf/jit_disasm_helpers.c        | 27 ++++-
  tools/testing/selftests/bpf/progs/bpf_misc.h  |  1 +
  .../bpf/progs/verifier_tailcall_jit.c         | 69 +++++++++++++
  tools/testing/selftests/bpf/test_loader.c     |  5 +
  8 files changed, 203 insertions(+), 40 deletions(-)



Reply via email to