Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling

Pu Hu Thu, 02 Jul 2026 03:29:48 -0700

On 7/1/2026 9:56 PM, Pu Hu wrote:
> On 7/1/2026 9:43 PM, Masami Hiramatsu wrote:
>> On Wed, 1 Jul 2026 12:14:54 +0000
>> Pu Hu <[email protected]> wrote:
>>
>>> From: hupu <[email protected]>
>>>
>>> This series fixes two arm64 kprobes issues observed when running
>>> simpleperf with preemptirq tracepoints and dwarf callchains while a
>>> kprobe is active on a frequently executed kernel function.
>>>
>>> The crash happens in the kprobe debug exception path. While a kprobe is
>>> preparing or executing its XOL single-step instruction, perf/trace code
>>> can run in the same window. That code may either take a fault of its own
>>> or hit another kprobe.
>>>
>>> Patch 1 makes kprobe_fault_handler() handle a fault in
>>> KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC points at the
>>> current kprobe's XOL instruction. Otherwise the fault is left to the
>>> normal fault handling path.
>>>
>>> Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
>>> recoverable one-level reentry. Only a hit while already in
>>> KPROBE_REENTER remains unrecoverable.
>>>
>>> This follows the same logic as the existing x86 fixes:
>>>    6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
>>>    6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on 
>>> single-stepping")
>>
>> Good catch!!
>> The series looks good to me.
>>
>> Acked-by: Masami Hiramatsu (Google) <[email protected]>
>>
>> But it should be reviewed by arm64 maintainers too.
>>
>> BTW, if you are "Pu Hu", the Signed-off-by tag should be
>> "Pu Hu <...>" instead of "hupu <...>".
>>
> 
> Hi Masami,
> 
> Thank you for your reply and Acked-by.
> 
> Yes, thanks for pointing this out. I will fix the author name and the
> Signed-off-by tags to use a consistent name in the next version of the
> patchset.
> 
> Thanks,
> hupu
>



Hi maintainers,

I have reproduced the same issue on the latest mainline kernel available
today. The commit I tested is 665159e24674.

Below are the steps I used to reproduce the issue. I hope this can help
with further debugging. The complete test case used in these steps will 
be provided in a follow-up email.

Reproduction steps:

1. Build the test case

Please use the test case that I will send in the next email. Depending 
on your local environment, the following variables in the Makefile may 
need to be adjusted:

     CROSS_COMPILE ?= aarch64-dumpstack-linux-gnu-
     KERN_DIR      ?= $(PWD)/../../output/build-mainline
     DEST_PATH     ?= $(PWD)/../../output

Then run:

     make all

This builds the userspace test program:

     fault_stress

and the kprobe module:

     kp_folio.ko

2. Boot QEMU

To increase memory pressure, I used only two CPUs and 512 MB of memory 
in the QEMU guest:

     SMP="-smp 2"

     qemu-system-aarch64 -m 512 -cpu cortex-a53 \
         -M virt,gic-version=3,its=on,iommu=smmuv3 \
         -nographic $SMP -kernel $KERNEL_IMAGE \
         -append "nokaslr noinitrd sched_debug root=/dev/vda 
rootfstype=ext4 rw crashkernel=256M loglevel=8" \
         -drive if=none,file=$ROOTFS_IMAGE,id=hd0,format=raw \
         -device virtio-blk-device,drive=hd0 \
         --fsdev local,id=kmod_dev,path=./output/,security_model=none \
         -device virtio-9p-pci,fsdev=kmod_dev,mount_tag=kmod_mount \
         -net nic -net tap,ifname=tap0,script=no,downscript=no \
         $GDB_DEBUG

3. Run the test in the guest

After the guest has booted, run the following commands.

Allow kernel symbols to be shown:

     echo 0 > /proc/sys/kernel/kptr_restrict

Load the kprobe module:

     insmod kp_folio.ko

Start the fault stress program:

     ./fault_stress &

Start stress-ng to add memory pressure:

     ./stress-ng --vm 2 --vm-bytes 70% --page-in &

Run perf against the fault_stress process. In the command below, 171 is 
the PID of fault_stress in my test environment:

     ./perf record -p 171 -c 1 \
         -e preemptirq:preempt_disable \
         -e preemptirq:preempt_enable \
         --call-graph dwarf \
         -o /tmp/perf.data \
         -- sleep 5

With the steps above, I can reproduce the crash reliably in my local 
QEMU setup. After applying my previously submitted fix, I can no longer 
reproduce the issue with the same test.

The crash log is shown below:

[  173.383321] kp_folio: hit=1564 comm=fault_stress tgid=171 tid=173
[  173.402940] kp_folio: hit=1565 comm=fault_stress tgid=171 tid=179
[  173.528342] kp_folio: hit=1566 comm=fault_stress tgid=171 tid=175
[  173.846895] kp_folio: hit=1567 comm=fault_stress tgid=171 tid=172
[  174.223031] kp_folio: hit=1568 comm=fault_stress tgid=171 tid=179
[  174.224419] kp_folio: hit=1569 comm=fault_stress tgid=171 tid=174
[  174.928471] kp_folio: hit=1570 comm=fault_stress tgid=171 tid=175
[  174.930916] Unable to handle kernel paging request at virtual address 
0000ffffa3592000
[  174.931068] Mem abort info:
[  174.931116]   ESR = 0x0000000096000007
[  174.931180]   EC = 0x25: DABT (current EL), IL = 32 bits
[  174.931240]   SET = 0, FnV = 0
[  174.931368]   EA = 0, S1PTW = 0
[  174.931430]   FSC = 0x07: level 3 translation fault
[  174.931490] Data abort info:
[  174.931540]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[  174.931593]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  174.931669]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  174.931762] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049bf8000
[  174.931829] [0000ffffa3592000] pgd=0800000049a99403, 
p4d=0800000049a99403, pud=0800000049ac0403, pmd=0800000049bed403, 
pte=00000000000047c0
[  174.932328] Internal error: Oops: 0000000096000007 [#1]  SMP
[  174.939042] Modules linked in: kp_folio(O)
[  174.942114] CPU: 1 UID: 0 PID: 175 Comm: fault_stress Tainted: G 
      O        7.2.0-rc1-00010-g7679152d724a-dirty #2 PREEMPT
[  174.945427] Tainted: [O]=OOT_MODULE
[  174.946006] Hardware name: linux,dummy-virt (DT)
[  174.947011] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[  174.948582] pc : folio_wait_bit_common+0x0/0x320
[  174.949626] lr : perf_output_sample+0x708/0x968
[  174.950041] sp : ffff800084b13540
[  174.950511] x29: ffff800084b13570 x28: ffff000006704260 x27: 
0000ffffa3591d08
[  174.953274] x26: ffff000009a19a80 x25: 0000000000000000 x24: 
ffff800084b13780
[  174.953601] x23: 0000000000000ee8 x22: 000000000000b5ef x21: 
0000000000001000
[  174.954003] x20: 0000000000000ee8 x19: ffff800084b135e0 x18: 
000000000000000a
[  174.954262] x17: ffff8000803d1af4 x16: ffff80008036d01c x15: 
0000ffffa3591d08
[  174.954549] x14: 0000000000000000 x13: 0000000000000000 x12: 
0000000000000000
[  174.954863] x11: 0000000000000000 x10: 0000000000000000 x9 : 
0000000000000000
[  174.955315] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 
ffff0000069ce2c8
[  174.955592] x5 : ffff0000069ceee8 x4 : 0000000000000008 x3 : 
0000000000000000
[  174.956083] x2 : 0000000000000be0 x1 : 0000ffffa3592000 x0 : 
ffff0000069ce000
[  174.956838] Call trace:
[  174.958282]  folio_wait_bit_common+0x0/0x320 (P)
[  174.958618]  perf_event_output_forward+0xc0/0x1a8
[  174.958811]  __perf_event_overflow+0x108/0x518
[  174.959066]  perf_swevent_event+0x238/0x260
[  174.959295]  perf_tp_event+0x34c/0x6a0
[  174.959667]  perf_trace_run_bpf_submit+0x8c/0xd0
[  174.962331]  perf_trace_preemptirq_template+0xc4/0x130
[  174.962644]  trace_preempt_on+0x114/0x1e8
[  174.963019]  preempt_count_sub+0x78/0xe0
[  174.963402]  el1_brk64+0x40/0x60
[  174.963617]  el1h_64_sync_handler+0x68/0xb0
[  174.963817]  el1h_64_sync+0x6c/0x70
[  174.964239]  0xffff8000846c5000 (P)
[  174.964938]  __do_fault+0x44/0x288
[  174.965452]  __handle_mm_fault+0xaf8/0x1a40
[  174.965815]  handle_mm_fault+0xb4/0x420
[  174.966527]  do_page_fault+0x140/0x7b0
[  174.967398]  do_translation_fault+0x4c/0x70
[  174.968057]  do_mem_abort+0x48/0xa0
[  174.969705]  el0_da+0x64/0x290
[  174.969984]  el0t_64_sync_handler+0xd0/0xe8
[  174.970324]  el0t_64_sync+0x198/0x1a0
[  174.970713] Code: d50323bf d65f03c0 12800140 17fffffc (d4200080)
[  174.971338] kp_folio: hit=1571 comm=fault_stress tgid=171 tid=174
[  174.972266] ---[ end trace 0000000000000000 ]---

I will send the complete test case in a follow-up email.

Thanks,
hupu

Re: [RFC 0/2] arm64: kprobes: Fix single-step fault and reentry handling

Reply via email to