Dear Maintainers,
I would like to provide some additional background for this patchset.
We observed a high-probability crash on an Android device running a
6.1.145-based kernel when recording preemptirq tracepoints for a user
space process with dwarf callchains enabled.
The command used to reproduce the issue is:
simpleperf record -p <PID> -f 10000 \
-e preemptirq:preempt_disable \
-e preemptirq:preempt_enable \
--duration 9 --call-graph dwarf \
-o /data/local/tmp/perf.data
Here <PID> is the PID of a user space process, for example a foreground
application UI thread or RenderThread.
One important observation is that the crash does not reproduce if
"--call-graph dwarf" is removed.
The crash log shows a data abort on a user virtual address while the PC
is at a probed kernel instruction:
[ 297.177775] Unable to handle kernel paging request at virtual
address 0000007ff042e000
[ 297.177792] Mem abort info:
[ 297.177795] ESR = 0x0000000096000007
[ 297.177799] EC = 0x25: DABT (current EL), IL = 32 bits
[ 297.177803] SET = 0, FnV = 0
[ 297.177806] EA = 0, S1PTW = 0
[ 297.177808] FSC = 0x07: level 3 translation fault
[ 297.177811] Data abort info:
[ 297.177814] ISV = 0, ISS = 0x00000007
[ 297.177817] CM = 0, WnR = 0
[ 297.177820] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098c9f2000
[ 297.177825] [0000007ff042e000] pgd=08000009aaaea003,
p4d=08000009aaaea003, pud=08000009aaaea003, pmd=08000000abca0003,
pte=0000000000000000
[ 297.177835] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 297.178070] Skip md ftrace buffer dump for: 0x2800d70
...
[ 297.178485] CPU: 6 PID: 10214 Comm: id.article.news Tainted: P S
W O 6.1.145-android14-11-maybe-dirty-qki-consolidate #1
[ 297.178489] Hardware name: Qualcomm Technologies, Inc. Volcano
QRD,x6878 (DT)
[ 297.178491] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS
BTYPE=--)
[ 297.178493] pc : folio_wait_bit_common+0x0/0x408
[ 297.178499] lr : perf_output_sample+0x57c/0xacc
[ 297.178502] sp : ffffffc0366c2f90
[ 297.178503] x29: ffffffc0366c2fb0 x28: 0000000000001000 x27:
0000007ff042d5f8
[ 297.178507] x26: 00000000000035e7 x25: 0000000000000000 x24:
ffffff892cec3000
[ 297.178510] x23: 0000000000001000 x22: 0000000000009370 x21:
ffffffc0366c3140
[ 297.178512] x20: ffffff888aa1a180 x19: ffffffc0366c3020 x18:
ffffffe01103b340
[ 297.178515] x17: 00000000ad6b63b6 x16: 00000000ad6b63b6 x15:
0000007ff042d5f8
[ 297.178518] x14: 0000000000000000 x13: 003436737365636f x12:
72705f7070612f6e
[ 297.178520] x11: 69622f6d65747379 x10: 732f0030333d7972 x9 :
616d6972705f6c6f
[ 297.178523] x8 : 6f705f706173755f x7 : 54454b434f535f44 x6 :
ffffff892cec39d8
[ 297.178526] x5 : ffffff892cec4000 x4 : 0000000000000008 x3 :
6e6f6973736e6172
[ 297.178528] x2 : 00000000000005b8 x1 : 0000007ff042e000 x0 :
ffffff892cec3000
[ 297.178531] Call trace:
[ 297.178532] folio_wait_bit_common+0x0/0x408
[ 297.178535] perf_event_output_forward+0x90/0xdc
[ 297.178537] __perf_event_overflow+0x128/0x1e8
[ 297.178540] perf_swevent_event+0x94/0x1a0
[ 297.178543] perf_tp_event+0x140/0x270
[ 297.178545] perf_trace_run_bpf_submit+0x84/0xe0
[ 297.178547] perf_trace_preemptirq_template+0xe8/0x124
[ 297.178553] trace_preempt_on+0xec/0x150
[ 297.178555] preempt_count_sub+0xa8/0x12c
[ 297.178562] do_debug_exception+0xd0/0x148
[ 297.178568] el1_dbg+0x64/0x80
[ 297.178575] el1h_64_sync_handler+0x3c/0x90
[ 297.178577] el1h_64_sync+0x68/0x6c
[ 297.178579] folio_wait_bit_common+0x0/0x408
[ 297.178582] __get_node_page+0xdc/0x49c
[ 297.178587] f2fs_get_dnode_of_data+0x404/0x950
[ 297.178589] f2fs_map_blocks+0x1e0/0xdf8
[ 297.178591] f2fs_mpage_readpages+0x1f0/0x8d0
[ 297.178594] f2fs_readahead+0x84/0x10c
[ 297.178596] read_pages+0xb8/0x434
[ 297.178603] page_cache_ra_unbounded+0x9c/0x2f0
[ 297.178605] page_cache_ra_order+0x2b0/0x348
[ 297.178608] do_sync_mmap_readahead+0xd0/0x228
[ 297.178612] filemap_fault+0x158/0x46c
[ 297.178615] f2fs_filemap_fault+0x28/0x114
[ 297.178617] handle_mm_fault+0x4f8/0x1468
[ 297.178620] do_page_fault+0x208/0x4b8
[ 297.178622] do_translation_fault+0x38/0x54
[ 297.178624] do_mem_abort+0x58/0x118
[ 297.178626] el0_da+0x48/0xb8
[ 297.178629] el0t_64_sync_handler+0x98/0xb4
[ 297.178632] el0t_64_sync+0x1a4/0x1a8
[ 297.178634] Code: 94000004 a8c17bfd d50323bf d65f03c0 (d4200080)
[ 297.178639] ---[ end trace 0000000000000000 ]---
The instruction d4200080 is the kprobe BRK instruction. The stack also
shows that the fault happens while handling a kprobe debug exception,
and the perf/trace path is entered from that window.
From the fulldump analysis, the issue appears to be related to the arm64
kprobe single-step/reentry handling. While a kprobe is preparing or
executing its XOL single-step instruction, perf/trace code may run in
the same window. With dwarf callchains enabled, this path may also
access user memory and take a data abort. In addition, another kprobe
may be hit while the first kprobe is still in KPROBE_HIT_SS state.
This matches the type of issue that was fixed on x86 by the following
commits:
6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on
single-stepping")
This patchset applies the same idea to arm64:
- Patch 1 makes the arm64 kprobe fault handler handle a fault in
KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC is the current
kprobe's XOL instruction. Otherwise, the fault is left to the normal
fault handling path.
- Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
recoverable one-level reentry. The unrecoverable case remains a hit
while already in KPROBE_REENTER.
With both patches applied, we have kept the same stress test running for
three days and the crash is no longer reproduced.
I still have the full dmesg and fulldump from the crash device. Please
let me know if any additional information would be useful.
Thanks,
hupu