Hello Dana, On 20/06/24(Thu) 17:16, Dana Koch wrote: > On Thu, Jun 20, 2024 at 3:33 PM Martin Pieuchot <m...@openbsd.org> wrote: > > > > Hello Dana, > > > > Thanks again for your report. > > > > On 19/06/24(Wed) 09:37, Dana Koch wrote: > > > On Wed, Jun 19, 2024 at 6:58 AM Martin Pieuchot <m...@openbsd.org> wrote: > > > > This is a lock order reversal reported by WITNESS. Thankfully claudio@ > > > > already committed a fix for this on the 16th. So please, try with > > > > up-to-date sources > > > > > > Just to be paranoid, I built a kernel with recent sources and > > > MP_LOCKDEBUG and WITNESS. I experienced both the "lock spun out" error > > > after "starting network" -- but not on serial console, unfortunately > > > -- and from `make -j24` as mentioned which I did capture. > > > > The problem is exposed by the many threads of lld(1). While "starting > > network" the boot process relinks a kernel. More details below. > > > > Since when do you experience this issue? > > Since I got the device earlier in June and put 7.5-current on it. > > I have been trying fresh kernels each time; the most recent ones > haven't been tripping at boot as frequently (perhaps the lock order > reversal fix has solved part but not all of the underlying problem). > > > The issue is related to the SCHED_LOCK(). Could you please next time use > > "ps /o" in ddb, this will help me figure out which CPU trace correspond > > to the process holding the KERNEL_LOCK(). > > Done. Here is a repro from today with `ps /o` output. (Perhaps worth > noting the "lock spun out" message happening during the ddb session > after `mach ddbcpu 9`, too.)
Could you try the diff below? Stuart confirmed it prevents the hang on his machine. Index: kern/kern_synch.c =================================================================== RCS file: /cvs/src/sys/kern/kern_synch.c,v diff -u -p -r1.205 kern_synch.c --- kern/kern_synch.c 3 Jun 2024 12:48:25 -0000 1.205 +++ kern/kern_synch.c 22 Jun 2024 12:57:37 -0000 @@ -576,25 +576,8 @@ wakeup(const volatile void *chan) int sys_sched_yield(struct proc *p, void *v, register_t *retval) { - struct proc *q; - uint8_t newprio; - - /* - * If one of the threads of a multi-threaded process called - * sched_yield(2), drop its priority to ensure its siblings - * can make some progress. - */ - mtx_enter(&p->p_p->ps_mtx); - newprio = p->p_usrpri; - TAILQ_FOREACH(q, &p->p_p->ps_threads, p_thr_link) - newprio = max(newprio, q->p_runpri); - mtx_leave(&p->p_p->ps_mtx); - - SCHED_LOCK(); - setrunqueue(p->p_cpu, p, newprio); - p->p_ru.ru_nvcsw++; - mi_switch(); - SCHED_UNLOCK(); + /* Force a sleep cycle to prevent contending on the SCHED_LOCK(). */ + tsleep(&nowake, PUSER, "yield", 1); return (0); }