Hi, I attached the vmcore with vmlinux symbol for further analysis and will share it at the following link.
Link: https://drive.google.com/file/d/1_RFdpdWNuLdO-Yx6d7vIX-WAFX4X_msH/view?usp=drive_link On 5/6/25 9:30 오전, Yunseong Kim wrote: > Hi Colin, > >>> The crash seems to originate from rcu_do_batch(), jumping to a pointer >>> (0xffff00003a114000) that appears to be non-executable. >>> The PTE for the address confirms XN=1. Given the heavy binderfs workload, I >>> suspect there may be a use-after-free or dangling pointer involved in a >>> callback invocation. >>> >>> Platform: >>> Architecture: arm64 >>> Virtualized environment: Apple Silicon M2 (Apple Virtualization Framework) >>> Kernel version: 6.15.0-rc4+ >>> Attached Config: CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_KASAN=y >>> >>> Reproducer: >>> sudo ./stress-ng --binderfs 8 --binderfs-ops 10000 -t 15 \ >>> --pathological --timestamp --tz --syslog --perf --no-rand-seed \ >>> --times --metrics --klog-check --status 5 -x smi -v --interrupts >>> --change-cpu >> >> >> I suspect --change-cpu is required to trigger this issue. Does it trigger >> without this option? Can you reproduce the issue when reducing the number >> of --binderfs intances? > > As you suggested, I've been testing combinations of enabling and disabling > '--binderfs' and '--change-cpu' separately. > > While I'm not deeply familiar with the internal mechanisms of binderfs, > I found that the panic still occurs consistently with --binderfs, even > without the --change-cpu option. > However, the panic occurred when I used 4 instances. > > Reproducer with reduced instance from 8 to 4 and without change-cpu option: > sudo ./stress-ng --binderfs 4 --binderfs-ops 10000 -t 15 \ > --pathological --timestamp --tz --syslog --perf --no-rand-seed \ > --times --metrics --klog-check --status 5 -x smi -v --interrupts [ 194.911021] Unable to handle kernel execute from non-executable memory at virtual address ffff0000312ebe00 [ 194.911043] Mem abort info: [ 194.911056] ESR = 0x000000008600000f [ 194.911065] EC = 0x21: IABT (current EL), IL = 32 bits [ 194.911118] SET = 0, FnV = 0 [ 194.911130] EA = 0, S1PTW = 0 [ 194.911139] FSC = 0x0f: level 3 permission fault [ 194.911149] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000163388000 [ 194.911160] [ffff0000312ebe00] pgd=180000016ffff403, p4d=180000016ffff403, pud=180000016fffe403, pmd=180000016fff4403, pte=00680000712eb707 [ 194.911201] Internal error: Oops: 000000008600000f [#1] SMP [ 194.911211] Modules linked in: overlay isofs uinput snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables qrtr sunrpc virtio_snd snd_seq snd_seq_device snd_pcm snd_timer snd virtio_net soundcore net_failover virtio_balloon failover vfat fat joydev loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common zram vmw_vsock_vmci_transport lz4hc_compress vmw_vmci lz4_compress vsock uas polyval_ce polyval_generic ghash_ce usb_storage sha3_ce sha512_ce virtio_gpu sha512_arm64 virtio_dma_buf apple_mfi_fastcharge fuse [ 194.911440] CPU: 2 UID: 0 PID: 27 Comm: ksoftirqd/2 Kdump: loaded Not tainted 6.15.0-rc4+ #1 PREEMPT(voluntary) [ 194.911452] Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2075.101.2.0.0 03/12/2025 [ 194.911459] pstate: 21400805 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=-c) [ 194.911469] pc : 0xffff0000312ebe00 [ 194.911492] lr : rcu_do_batch+0x2dc/0x860 [ 194.911506] sp : ffff800080143c90 [ 194.911512] x29: ffff800080143cb0 x28: ffff00003059e600 x27: ffff0000312ebe00 [ 194.911534] x26: ffff800084442000 x25: 0000000000000000 x24: ffff8000843d9b18 [ 194.911549] x23: ffff800082150ac0 x22: 0000000000000003 x21: 000000000000000a [ 194.911563] x20: ffff0000c11233c0 x19: ffff00012f0e1e00 x18: 0000000000000000 [ 194.911578] x17: ffff80008214506c x16: 002c7c955b9d7f7c x15: 0000000000000000 [ 194.911592] x14: 0000000000000002 x13: 0000000000ff0100 x12: ffff8000801c3410 [ 194.911607] x11: 0000000000180017 x10: 0000000000ff0100 x9 : ffff80008385a580 [ 194.911621] x8 : 0000000100000100 x7 : 7f7f7f7f7f7f7f7f x6 : ffff8000803f89bc [ 194.911636] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000002 [ 194.911650] x2 : 0000000000000000 x1 : ffff800082a4aeb8 x0 : ffff00003059e600 [ 194.911744] Call trace: [ 194.911747] 0xffff0000312ebe00 (P) [ 194.911759] rcu_core+0x2a0/0x4e8 [ 194.911767] rcu_core_si+0x1c/0x30 [ 194.911773] handle_softirqs+0x1b4/0x588 [ 194.911782] run_ksoftirqd+0x5c/0xf8 [ 194.911787] smpboot_thread_fn+0x27c/0x490 [ 194.911794] kthread+0x2ac/0x318 [ 194.911802] ret_from_fork+0x10/0x20 [ 194.911811] Code: 00000000 00000000 00000000 00000000 (00000000) [ 194.911821] SMP: stopping secondary CPUs [ 194.912168] Starting crashdump kernel... [ 194.912171] Bye! crash> kmem ffff0000312ebe00 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff0000c000cc00 512 185 576 36 8k kmalloc-rnd-10-512 SLAB MEMORY NODE TOTAL ALLOCATED FREE fffffd7fc0f5e920 ffff0000312ea000 0 16 0 16 FREE / [ALLOCATED] ffff0000312ebe00 The address ffff0000312ebe00 is listed in the FREE list, meaning this memory slot has already been freed. PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffffd7fc0f5e970 712eb000 ffffffff ffffffffffffffff 0 1 locked This memory page itself no longer has any valid page mapping (MAPPING=ffffffff), and its reference count (CNT) is also 0. it is completely freed. It appears that a struct rcu_head object, which had already been freed, remained in the RCU callback list, and an attempt was made to call a function using this invalid (freed) slot through the rcu_head, resulting in the error. I think it is unusual that this memory bug was not detected even with KASAN enabled, especially since it has not been caught despite the issue occurring repeatedly. I'll check if there are any potential issues with improper RCU usage in otherside. Best regards, Yunseong Kim