On Tue, Feb 20, 2024 at 09:34:12PM +0100, Martin Pieuchot wrote: > On 28/10/21(Thu) 05:45, Visa Hankala wrote: > > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote: > > > > > > Dave Voutila <d...@sisu.io> writes: > > > > > > > Was tinkering on a bt(5) script for trying to debug an issue in vmm(4) > > > > when I managed to start hitting a panic "wakeup: p_stat is 2" being > > > > triggered by kqueue coming from the softnet kernel task. > > > > > > > > I'm running an amd64 kernel built from the tree today (latest CVS commit > > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from > > > > GENERIC.MP. Userland is from the latest amd snap. > > > > > > > > To reproduce, I'm running a single OpenBSD-current guest under vmd(8) > > > > which I'm targeting with the following trivial btrace script I was > > > > working on to use for debugging something in vmm(4): > > > > > > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{ > > > > printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs); > > > > } > > > > > > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{ > > > > printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs); > > > > } > > > > > > Even easier reproduction: if you have 2 machines and can use tcpbench(1) > > > between them, then while tcpbench is running target it with the above > > > btrace script. I've found running the script, killing it with ctrl-c, > > > and re-running it 2-3 times triggers the panic on my laptop. > > > > > > > > > > > Both times this happened I was trying to sysupgrade the vmd(8) guest > > > > while running the above btrace script. When I don't run the script, > > > > there is no panic. > > > > > > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj > > > > > > > > Simple transcript of the call stack after the panic() call looks like: > > > > > > > > wakeup_n > > > > kqueue_wakeup > > > > knote > > > > selwakekup > > > > tun_enqueue > > > > ether_output > > > > ip_output > > > > ip_forward > > > > ip_input_if > > > > ipv4_input > > > > ether_input > > > > if_input_process > > > > > > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that > > > > imgur link) > > > > I think the problem is recursion within the scheduler. Trace points > > invoke wakeup() directly, which is unsafe when done from within the > > run queue routines. > > > > One approach to avoid the recursion is to defer the wakeup() with > > a soft interrupt. The scheduler runs at an elevated system priority > > level that blocks soft interrupts. The deferred wakeup() is issued once > > the system priority level turns low enough. > > > > Unfortunately, the patch will get broken when someone adds trace points > > to soft interrupt scheduling... > > Thanks for the report. Sorry for the delay. I'm now really interested > in fixing this bug because I'm heavily using btrace(8) to analyse the > scheduler and I hit this panic a couple of times. > > The problem is that `pnext' might no longer be on the sleepqueue after a > tracepoint inside setrunnable() fired. Diff below fixes that by making > wakeup_n() re-entrant. > > I'm not very interested in committing this diff because it relies on a > recursive SCHED_LOCK(). Instead I'd prefer to split wakeup_n() in two > stages: first unlink the threads then call setrunnable(). This approach > will help us untangle the sleepqueue but needs a bit more shuffling, > like moving unsleep() out of setrunnable()... > > Claudio, Visa do you agree? > > Index: kern/kern_synch.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.200 > diff -u -p -r1.200 kern_synch.c > --- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000 1.200 > +++ kern/kern_synch.c 20 Feb 2024 19:51:53 -0000 > @@ -484,7 +484,7 @@ wakeup_proc(struct proc *p, const volati > unsleep(p); > #ifdef DIAGNOSTIC > else > - panic("wakeup: p_stat is %d", (int)p->p_stat); > + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); > #endif > } > > @@ -532,21 +532,35 @@ void > wakeup_n(const volatile void *ident, int n) > { > struct slpque *qp; > - struct proc *p; > - struct proc *pnext; > + struct proc *p, piter; > int s; > > + memset(&piter, 0, sizeof(piter)); > + piter.p_stat = SMARKER; > + > SCHED_LOCK(s); > qp = &slpque[LOOKUP(ident)]; > - for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) { > - pnext = TAILQ_NEXT(p, p_runq); > + TAILQ_INSERT_HEAD(qp, &piter, p_runq); > + while (n != 0) { > + p = TAILQ_NEXT(&piter, p_runq); > + if (p == NULL) > + break; > + > + /* Move marker forward */ > + TAILQ_REMOVE(qp, &piter, p_runq); > + TAILQ_INSERT_AFTER(qp, p, &piter, p_runq); > + > + if (p->p_stat == SMARKER) > + continue; > + > #ifdef DIAGNOSTIC > if (p->p_stat != SSLEEP && p->p_stat != SSTOP) > - panic("wakeup: p_stat is %d", (int)p->p_stat); > + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); > #endif > if (wakeup_proc(p, ident, 0)) > --n; > } > + TAILQ_REMOVE(qp, &piter, p_runq); > SCHED_UNLOCK(s); > } > > Index: sys/proc.h > =================================================================== > RCS file: /cvs/src/sys/sys/proc.h,v > retrieving revision 1.356 > diff -u -p -r1.356 proc.h > --- sys/proc.h 3 Feb 2024 18:51:58 -0000 1.356 > +++ sys/proc.h 20 Feb 2024 19:54:43 -0000 > @@ -414,6 +414,7 @@ struct proc { > #define SZOMB 5 /* unused */ > #define SDEAD 6 /* Thread is almost gone */ > #define SONPROC 7 /* Thread is currently on a CPU. */ > +#define SMARKER 127 /* Used by markers in reentrant > iterator */ > > #define P_ZOMBIE(p) ((p)->p_stat == SDEAD) > #define P_HASSIBLING(p) ((p)->p_p->ps_threadcnt > 1)
Ugh, this is too much magic for my little brain. I would prefer to unqueue the procs onto a new local list and have a special wakeup_proc for them. The code around wakeup and unsleep will need some changes to unlock it properly. -- :wq Claudio