On Thu, Jun 20, 2024 at 3:33 PM Martin Pieuchot <m...@openbsd.org> wrote: > > Hello Dana, > > Thanks again for your report. > > On 19/06/24(Wed) 09:37, Dana Koch wrote: > > On Wed, Jun 19, 2024 at 6:58 AM Martin Pieuchot <m...@openbsd.org> wrote: > > > This is a lock order reversal reported by WITNESS. Thankfully claudio@ > > > already committed a fix for this on the 16th. So please, try with > > > up-to-date sources > > > > Just to be paranoid, I built a kernel with recent sources and > > MP_LOCKDEBUG and WITNESS. I experienced both the "lock spun out" error > > after "starting network" -- but not on serial console, unfortunately > > -- and from `make -j24` as mentioned which I did capture. > > The problem is exposed by the many threads of lld(1). While "starting > network" the boot process relinks a kernel. More details below. > > Since when do you experience this issue?
Since I got the device earlier in June and put 7.5-current on it. I have been trying fresh kernels each time; the most recent ones haven't been tripping at boot as frequently (perhaps the lock order reversal fix has solved part but not all of the underlying problem). > The issue is related to the SCHED_LOCK(). Could you please next time use > "ps /o" in ddb, this will help me figure out which CPU trace correspond > to the process holding the KERNEL_LOCK(). Done. Here is a repro from today with `ps /o` output. (Perhaps worth noting the "lock spun out" message happening during the ddb session after `mach ddbcpu 9`, too.) __mp_lock_spin: 0xffffff80012960f0 lock spun out Stopped at __mp_lock+0x138: ldr w8, [x23,#592] ddb{19}> __mp_lock_spin: 0xffffff80012960f0 lock spun out ddb{19}> show panic__mp_lock_spin: 0xffffff80012960f0 lock spun out the kernel did not panic ddb{19}> trace db_enter() at __mp_lock+0x134 __mp_lock() at reaper+0x108 reaper() at proc_trampoline+0xc ddb{19}> ps /o TID PID UID PRFLAGS PFLAGS CPU COMMAND 490781 81712 1000 0x18000003 0x4000000 11 ld 198600 81712 1000 0x18000003 0x4000000 8 ld 309127 81712 1000 0x18000003 0x4000000 10 ld * 61265 85684 0 0x14000 0x200 19K reaper ddb{19}> show all locks CPU 17: exclusive mutex &sched_lock r = 0 (0xffffff8001251038) ddb{19}> mach ddbcpu 0 Stopped at __mp_lock+0x138: ldr w8, [x23,#592] ddb{0}> trace db_enter() at __mp_lock+0x134 __mp_lock() at aplintc_irq_handler+0x158 aplintc_irq_handler() at arm_cpu_irq+0x34 arm_cpu_irq() at handle_el1h_irq+0x68 handle_el1h_irq() at db_enter_ddb+0x25c db_enter_ddb() at kdb_trap+0x64 kdb_trap() at db_trapper+0x30 db_trapper() at handle_el1h_sync+0x68 handle_el1h_sync() at db_enter+0x14 db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at aq_lookup+0x80 aq_intr() at arm_cpu_irq+0x34 arm_cpu_irq() at handle_el1h_irq+0x68 handle_el1h_irq() at do_el1h_sync+0x40 do_el1h_sync() at handle_el1h_sync+0x68 handle_el1h_sync() at db_enter+0x14 db_enter() at __mp_lock+0x134 __mp_lock() at softintr_biglock_wrap+0x14 softintr_biglock_wrap() at softintr_dispatch+0x84 softintr_dispatch() at arm_do_pending_intr+0xfc arm_do_pending_intr() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{0}> mach ddbcpu 1 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{1}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{1}> mach ddbcpu 2 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{2}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0x114 mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{2}> mach ddbcpu 3 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{3}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{3}> mach ddbcpu 4 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{4}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xac mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{4}> mach ddbcpu 5 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{5}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0x114 mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{5}> mach ddbcpu 6 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{6}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{6}> mach ddbcpu 7 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{7}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{7}> mach ddbcpu 8 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{8}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at handle_el0_sync+0x74 handle_el0_sync() at 0x44b6465f4 --- trap --- end of kernel ddb{8}> mach ddbcpu 9 mtx_enter: 0xffffff8001251028 lock spun out Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{9}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{9}> mach ddbcpu 0xa Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{10}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at aplintc_splx+0x4c aplintc_splx() at sys_sched_yield+0x94 sys_sched_yield() at svc_handler+0x478 svc_handler() at do_el0_sync+0xc8 do_el0_sync() at handle_el0_sync+0x70 handle_el0_sync() at 0x44b6465f4 --- trap --- end of kernel ddb{10}> mach ddbcpu 0xb Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{11}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at aplintc_splx+0x4c aplintc_splx() at sys_sched_yield+0x94 sys_sched_yield() at svc_handler+0x478 svc_handler() at do_el0_sync+0xc8 do_el0_sync() at handle_el0_sync+0x70 handle_el0_sync() at 0x44b6465f4 --- trap --- end of kernel ddb{11}> mach ddbcpu 0xc Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{12}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{12}> mach ddbcpu 0xd Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{13}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{13}> mach ddbcpu 0xe Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{14}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{14}> mach ddbcpu 0xf Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{15}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0x114 mtx_enter() at sched_idle+0x12c sched_idle() at proc_trampoline+0xc ddb{15}> mach ddbcpu 0x10 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{16}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{16}> mach ddbcpu 0x11 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{17}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at agtimer_get_timecount_default+0x18 agtimer_get_timecount_default() at nanouptime+0x44 nanouptime() at mi_switch+0x9c mi_switch() at sched_idle+0x134 sched_idle() at proc_trampoline+0xc ddb{17}> mach ddbcpu 0x12 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{18}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{18}> mach ddbcpu 0x13 Stopped at __mp_lock+0x138: ldr w8, [x23,#592] ddb{19}> trace db_enter() at __mp_lock+0x134 __mp_lock() at reaper+0x108 reaper() at proc_trampoline+0xc ddb{19}> mach ddbcpu 0x14 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{20}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{20}> mach ddbcpu 0x15 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{21}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{21}> mach ddbcpu 0x16 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{22}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at cpu_idle_cycle+0x28 cpu_idle_cycle() at sched_idle+0x218 sched_idle() at proc_trampoline+0xc ddb{22}> mach ddbcpu 0x17 Stopped at aplintc_fiq_handler+0x70: b ffffff8000c48724 <aplin tc_fiq_handler+0x7c> ddb{23}> trace db_enter() at aplintc_fiq_handler+0x6c aplintc_fiq_handler() at arm_cpu_fiq+0x34 arm_cpu_fiq() at handle_el1h_fiq+0x68 handle_el1h_fiq() at mtx_enter+0xec mtx_enter() at sched_toidle+0x90 sched_toidle() at sched_exit+0x5c sched_toidle() at exit1+0x5f0 exit1() at sys___threxit+0x4c process_untrace() at svc_handler+0x44c svc_handler() at do_el0_sync+0xc8 do_el0_sync() at handle_el0_sync+0x70 handle_el0_sync() at 0x44b5e61d0 --- trap --- end of kernel