On 28/10/21(Thu) 05:45, Visa Hankala wrote: > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote: > > > > Dave Voutila <d...@sisu.io> writes: > > > > > Was tinkering on a bt(5) script for trying to debug an issue in vmm(4) > > > when I managed to start hitting a panic "wakeup: p_stat is 2" being > > > triggered by kqueue coming from the softnet kernel task. > > > > > > I'm running an amd64 kernel built from the tree today (latest CVS commit > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from > > > GENERIC.MP. Userland is from the latest amd snap. > > > > > > To reproduce, I'm running a single OpenBSD-current guest under vmd(8) > > > which I'm targeting with the following trivial btrace script I was > > > working on to use for debugging something in vmm(4): > > > > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{ > > > printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs); > > > } > > > > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{ > > > printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs); > > > } > > > > Even easier reproduction: if you have 2 machines and can use tcpbench(1) > > between them, then while tcpbench is running target it with the above > > btrace script. I've found running the script, killing it with ctrl-c, > > and re-running it 2-3 times triggers the panic on my laptop. > > > > > > > > Both times this happened I was trying to sysupgrade the vmd(8) guest > > > while running the above btrace script. When I don't run the script, > > > there is no panic. > > > > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj > > > > > > Simple transcript of the call stack after the panic() call looks like: > > > > > > wakeup_n > > > kqueue_wakeup > > > knote > > > selwakekup > > > tun_enqueue > > > ether_output > > > ip_output > > > ip_forward > > > ip_input_if > > > ipv4_input > > > ether_input > > > if_input_process > > > > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that > > > imgur link) > > I think the problem is recursion within the scheduler. Trace points > invoke wakeup() directly, which is unsafe when done from within the > run queue routines. > > One approach to avoid the recursion is to defer the wakeup() with > a soft interrupt. The scheduler runs at an elevated system priority > level that blocks soft interrupts. The deferred wakeup() is issued once > the system priority level turns low enough. > > Unfortunately, the patch will get broken when someone adds trace points > to soft interrupt scheduling...
Thanks for the report. Sorry for the delay. I'm now really interested in fixing this bug because I'm heavily using btrace(8) to analyse the scheduler and I hit this panic a couple of times. The problem is that `pnext' might no longer be on the sleepqueue after a tracepoint inside setrunnable() fired. Diff below fixes that by making wakeup_n() re-entrant. I'm not very interested in committing this diff because it relies on a recursive SCHED_LOCK(). Instead I'd prefer to split wakeup_n() in two stages: first unlink the threads then call setrunnable(). This approach will help us untangle the sleepqueue but needs a bit more shuffling, like moving unsleep() out of setrunnable()... Claudio, Visa do you agree? Index: kern/kern_synch.c =================================================================== RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.200 diff -u -p -r1.200 kern_synch.c --- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000 1.200 +++ kern/kern_synch.c 20 Feb 2024 19:51:53 -0000 @@ -484,7 +484,7 @@ wakeup_proc(struct proc *p, const volati unsleep(p); #ifdef DIAGNOSTIC else - panic("wakeup: p_stat is %d", (int)p->p_stat); + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); #endif } @@ -532,21 +532,35 @@ void wakeup_n(const volatile void *ident, int n) { struct slpque *qp; - struct proc *p; - struct proc *pnext; + struct proc *p, piter; int s; + memset(&piter, 0, sizeof(piter)); + piter.p_stat = SMARKER; + SCHED_LOCK(s); qp = &slpque[LOOKUP(ident)]; - for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) { - pnext = TAILQ_NEXT(p, p_runq); + TAILQ_INSERT_HEAD(qp, &piter, p_runq); + while (n != 0) { + p = TAILQ_NEXT(&piter, p_runq); + if (p == NULL) + break; + + /* Move marker forward */ + TAILQ_REMOVE(qp, &piter, p_runq); + TAILQ_INSERT_AFTER(qp, p, &piter, p_runq); + + if (p->p_stat == SMARKER) + continue; + #ifdef DIAGNOSTIC if (p->p_stat != SSLEEP && p->p_stat != SSTOP) - panic("wakeup: p_stat is %d", (int)p->p_stat); + panic("thread %d p_stat is %d", p->p_tid, p->p_stat); #endif if (wakeup_proc(p, ident, 0)) --n; } + TAILQ_REMOVE(qp, &piter, p_runq); SCHED_UNLOCK(s); } Index: sys/proc.h =================================================================== RCS file: /cvs/src/sys/sys/proc.h,v retrieving revision 1.356 diff -u -p -r1.356 proc.h --- sys/proc.h 3 Feb 2024 18:51:58 -0000 1.356 +++ sys/proc.h 20 Feb 2024 19:54:43 -0000 @@ -414,6 +414,7 @@ struct proc { #define SZOMB 5 /* unused */ #define SDEAD 6 /* Thread is almost gone */ #define SONPROC 7 /* Thread is currently on a CPU. */ +#define SMARKER 127 /* Used by markers in reentrant iterator */ #define P_ZOMBIE(p) ((p)->p_stat == SDEAD) #define P_HASSIBLING(p) ((p)->p_p->ps_threadcnt > 1)