On Thu, Jun 20, 2024 at 3:33 PM Martin Pieuchot <m...@openbsd.org> wrote:
>
> Hello Dana,
>
> Thanks again for your report.
>
> On 19/06/24(Wed) 09:37, Dana Koch wrote:
> > On Wed, Jun 19, 2024 at 6:58 AM Martin Pieuchot <m...@openbsd.org> wrote:
> > > This is a lock order reversal reported by WITNESS.  Thankfully claudio@
> > > already committed a fix for this on the 16th.  So please, try with
> > > up-to-date sources
> >
> > Just to be paranoid, I built a kernel with recent sources and
> > MP_LOCKDEBUG and WITNESS. I experienced both the "lock spun out" error
> > after "starting network" -- but not on serial console, unfortunately
> > -- and from `make -j24` as mentioned which I did capture.
>
> The problem is exposed by the many threads of lld(1).  While "starting
> network" the boot process relinks a kernel.  More details below.
>
> Since when do you experience this issue?

Since I got the device earlier in June and put 7.5-current on it.

I have been trying fresh kernels each time; the most recent ones
haven't been tripping at boot as frequently (perhaps the lock order
reversal fix has solved part but not all of the underlying problem).

> The issue is related to the SCHED_LOCK(). Could you please next time use
> "ps /o" in ddb, this will help me figure out which CPU trace correspond
> to the process holding the KERNEL_LOCK().

Done. Here is a repro from today with `ps /o` output. (Perhaps worth
noting the "lock spun out" message happening during the ddb session
after `mach ddbcpu 9`, too.)

__mp_lock_spin: 0xffffff80012960f0 lock spun out
Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#592]
ddb{19}> __mp_lock_spin: 0xffffff80012960f0 lock spun out

ddb{19}> show panic__mp_lock_spin: 0xffffff80012960f0 lock spun out

the kernel did not panic
ddb{19}> trace
db_enter() at __mp_lock+0x134
__mp_lock() at reaper+0x108
reaper() at proc_trampoline+0xc
ddb{19}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 490781  81712   1000  0x18000003  0x4000000   11  ld
 198600  81712   1000  0x18000003  0x4000000    8  ld
 309127  81712   1000  0x18000003  0x4000000   10  ld
* 61265  85684      0     0x14000      0x200   19K reaper
ddb{19}> show all locks
CPU 17:
exclusive mutex &sched_lock r = 0 (0xffffff8001251038)
ddb{19}> mach ddbcpu 0
Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#592]
ddb{0}> trace
db_enter() at __mp_lock+0x134
__mp_lock() at aplintc_irq_handler+0x158
aplintc_irq_handler() at arm_cpu_irq+0x34
arm_cpu_irq() at handle_el1h_irq+0x68
handle_el1h_irq() at db_enter_ddb+0x25c
db_enter_ddb() at kdb_trap+0x64
kdb_trap() at db_trapper+0x30
db_trapper() at handle_el1h_sync+0x68
handle_el1h_sync() at db_enter+0x14
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at aq_lookup+0x80
aq_intr() at arm_cpu_irq+0x34
arm_cpu_irq() at handle_el1h_irq+0x68
handle_el1h_irq() at do_el1h_sync+0x40
do_el1h_sync() at handle_el1h_sync+0x68
handle_el1h_sync() at db_enter+0x14
db_enter() at __mp_lock+0x134
__mp_lock() at softintr_biglock_wrap+0x14
softintr_biglock_wrap() at softintr_dispatch+0x84
softintr_dispatch() at arm_do_pending_intr+0xfc
arm_do_pending_intr() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{0}> mach ddbcpu 1
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{1}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{1}> mach ddbcpu 2
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{2}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x114
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{2}> mach ddbcpu 3
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{3}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{3}> mach ddbcpu 4
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{4}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xac
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{4}> mach ddbcpu 5
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{5}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x114
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{5}> mach ddbcpu 6
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{6}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{6}> mach ddbcpu 7
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{7}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{7}> mach ddbcpu 8
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{8}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at handle_el0_sync+0x74
handle_el0_sync() at 0x44b6465f4
--- trap ---
end of kernel
ddb{8}> mach ddbcpu 9
mtx_enter: 0xffffff8001251028 lock spun out
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{9}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{9}> mach ddbcpu 0xa
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{10}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at aplintc_splx+0x4c
aplintc_splx() at sys_sched_yield+0x94
sys_sched_yield() at svc_handler+0x478
svc_handler() at do_el0_sync+0xc8
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x44b6465f4
--- trap ---
end of kernel
ddb{10}> mach ddbcpu 0xb
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{11}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at aplintc_splx+0x4c
aplintc_splx() at sys_sched_yield+0x94
sys_sched_yield() at svc_handler+0x478
svc_handler() at do_el0_sync+0xc8
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x44b6465f4
--- trap ---
end of kernel
ddb{11}> mach ddbcpu 0xc
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{12}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{12}> mach ddbcpu 0xd
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{13}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{13}> mach ddbcpu 0xe
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{14}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{14}> mach ddbcpu 0xf
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{15}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x114
mtx_enter() at sched_idle+0x12c
sched_idle() at proc_trampoline+0xc
ddb{15}> mach ddbcpu 0x10
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{16}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{16}> mach ddbcpu 0x11
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{17}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at agtimer_get_timecount_default+0x18
agtimer_get_timecount_default() at nanouptime+0x44
nanouptime() at mi_switch+0x9c
mi_switch() at sched_idle+0x134
sched_idle() at proc_trampoline+0xc
ddb{17}> mach ddbcpu 0x12
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{18}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{18}> mach ddbcpu 0x13
Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#592]
ddb{19}> trace
db_enter() at __mp_lock+0x134
__mp_lock() at reaper+0x108
reaper() at proc_trampoline+0xc
ddb{19}> mach ddbcpu 0x14
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{20}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{20}> mach ddbcpu 0x15
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{21}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{21}> mach ddbcpu 0x16
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{22}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at cpu_idle_cycle+0x28
cpu_idle_cycle() at sched_idle+0x218
sched_idle() at proc_trampoline+0xc
ddb{22}> mach ddbcpu 0x17
Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000c48724 <aplin
tc_fiq_handler+0x7c>
ddb{23}> trace
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x34
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0xec
mtx_enter() at sched_toidle+0x90
sched_toidle() at sched_exit+0x5c
sched_toidle() at exit1+0x5f0
exit1() at sys___threxit+0x4c
process_untrace() at svc_handler+0x44c
svc_handler() at do_el0_sync+0xc8
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x44b5e61d0

--- trap ---
end of kernel

Reply via email to