On 28/10/21(Thu) 05:45, Visa Hankala wrote:
> On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote:
> > 
> > Dave Voutila <d...@sisu.io> writes:
> > 
> > > Was tinkering on a bt(5) script for trying to debug an issue in vmm(4)
> > > when I managed to start hitting a panic "wakeup: p_stat is 2" being
> > > triggered by kqueue coming from the softnet kernel task.
> > >
> > > I'm running an amd64 kernel built from the tree today (latest CVS commit
> > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
> > > GENERIC.MP. Userland is from the latest amd snap.
> > >
> > > To reproduce, I'm running a single OpenBSD-current guest under vmd(8)
> > > which I'm targeting with the following trivial btrace script I was
> > > working on to use for debugging something in vmm(4):
> > >
> > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{
> > >   printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> > > }
> > >
> > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
> > >   printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> > > }
> > 
> > Even easier reproduction: if you have 2 machines and can use tcpbench(1)
> > between them, then while tcpbench is running target it with the above
> > btrace script. I've found running the script, killing it with ctrl-c,
> > and re-running it 2-3 times triggers the panic on my laptop.
> > 
> > >
> > > Both times this happened I was trying to sysupgrade the vmd(8) guest
> > > while running the above btrace script. When I don't run the script,
> > > there is no panic.
> > >
> > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj
> > >
> > > Simple transcript of the call stack after the panic() call looks like:
> > >
> > > wakeup_n
> > > kqueue_wakeup
> > > knote
> > > selwakekup
> > > tun_enqueue
> > > ether_output
> > > ip_output
> > > ip_forward
> > > ip_input_if
> > > ipv4_input
> > > ether_input
> > > if_input_process
> > >
> > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that
> > > imgur link)
> 
> I think the problem is recursion within the scheduler. Trace points
> invoke wakeup() directly, which is unsafe when done from within the
> run queue routines.
> 
> One approach to avoid the recursion is to defer the wakeup() with
> a soft interrupt. The scheduler runs at an elevated system priority
> level that blocks soft interrupts. The deferred wakeup() is issued once
> the system priority level turns low enough.
> 
> Unfortunately, the patch will get broken when someone adds trace points
> to soft interrupt scheduling...

Thanks for the report.  Sorry for the delay.  I'm now really interested
in fixing this bug because I'm heavily using btrace(8) to analyse the
scheduler and I hit this panic a couple of times.

The problem is that `pnext' might no longer be on the sleepqueue after a
tracepoint inside setrunnable() fired.  Diff below fixes that by making
wakeup_n() re-entrant.

I'm not very interested in committing this diff because it relies on a
recursive SCHED_LOCK().  Instead I'd prefer to split wakeup_n() in two
stages: first unlink the threads then call setrunnable().  This approach
will help us untangle the sleepqueue but needs a bit more shuffling,
like moving unsleep() out of setrunnable()...

Claudio, Visa do you agree?

Index: kern/kern_synch.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.200
diff -u -p -r1.200 kern_synch.c
--- kern/kern_synch.c   13 Sep 2023 14:25:49 -0000      1.200
+++ kern/kern_synch.c   20 Feb 2024 19:51:53 -0000
@@ -484,7 +484,7 @@ wakeup_proc(struct proc *p, const volati
                        unsleep(p);
 #ifdef DIAGNOSTIC
                else
-                       panic("wakeup: p_stat is %d", (int)p->p_stat);
+                       panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
 #endif
        }
 
@@ -532,21 +532,35 @@ void
 wakeup_n(const volatile void *ident, int n)
 {
        struct slpque *qp;
-       struct proc *p;
-       struct proc *pnext;
+       struct proc *p, piter;
        int s;
 
+       memset(&piter, 0, sizeof(piter));
+       piter.p_stat = SMARKER;
+
        SCHED_LOCK(s);
        qp = &slpque[LOOKUP(ident)];
-       for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
-               pnext = TAILQ_NEXT(p, p_runq);
+       TAILQ_INSERT_HEAD(qp, &piter, p_runq);
+       while (n != 0) {
+               p = TAILQ_NEXT(&piter, p_runq);
+               if (p == NULL)
+                       break;
+
+               /* Move marker forward */
+               TAILQ_REMOVE(qp, &piter, p_runq);
+               TAILQ_INSERT_AFTER(qp, p, &piter, p_runq);
+
+               if (p->p_stat == SMARKER)
+                       continue;
+
 #ifdef DIAGNOSTIC
                if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
-                       panic("wakeup: p_stat is %d", (int)p->p_stat);
+                       panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
 #endif
                if (wakeup_proc(p, ident, 0))
                        --n;
        }
+       TAILQ_REMOVE(qp, &piter, p_runq);
        SCHED_UNLOCK(s);
 }
 
Index: sys/proc.h
===================================================================
RCS file: /cvs/src/sys/sys/proc.h,v
retrieving revision 1.356
diff -u -p -r1.356 proc.h
--- sys/proc.h  3 Feb 2024 18:51:58 -0000       1.356
+++ sys/proc.h  20 Feb 2024 19:54:43 -0000
@@ -414,6 +414,7 @@ struct proc {
 #define        SZOMB   5               /* unused */
 #define        SDEAD   6               /* Thread is almost gone */
 #define        SONPROC 7               /* Thread is currently on a CPU. */
+#define SMARKER        127             /* Used by markers in reentrant 
iterator */
 
 #define        P_ZOMBIE(p)     ((p)->p_stat == SDEAD)
 #define        P_HASSIBLING(p) ((p)->p_p->ps_threadcnt > 1)

Reply via email to