On Tue, Feb 20, 2024 at 09:34:12PM +0100, Martin Pieuchot wrote:
> On 28/10/21(Thu) 05:45, Visa Hankala wrote:
> > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote:
> > > 
> > > Dave Voutila <d...@sisu.io> writes:
> > > 
> > > > Was tinkering on a bt(5) script for trying to debug an issue in vmm(4)
> > > > when I managed to start hitting a panic "wakeup: p_stat is 2" being
> > > > triggered by kqueue coming from the softnet kernel task.
> > > >
> > > > I'm running an amd64 kernel built from the tree today (latest CVS commit
> > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
> > > > GENERIC.MP. Userland is from the latest amd snap.
> > > >
> > > > To reproduce, I'm running a single OpenBSD-current guest under vmd(8)
> > > > which I'm targeting with the following trivial btrace script I was
> > > > working on to use for debugging something in vmm(4):
> > > >
> > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{
> > > >   printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> > > > }
> > > >
> > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
> > > >   printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> > > > }
> > > 
> > > Even easier reproduction: if you have 2 machines and can use tcpbench(1)
> > > between them, then while tcpbench is running target it with the above
> > > btrace script. I've found running the script, killing it with ctrl-c,
> > > and re-running it 2-3 times triggers the panic on my laptop.
> > > 
> > > >
> > > > Both times this happened I was trying to sysupgrade the vmd(8) guest
> > > > while running the above btrace script. When I don't run the script,
> > > > there is no panic.
> > > >
> > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj
> > > >
> > > > Simple transcript of the call stack after the panic() call looks like:
> > > >
> > > > wakeup_n
> > > > kqueue_wakeup
> > > > knote
> > > > selwakekup
> > > > tun_enqueue
> > > > ether_output
> > > > ip_output
> > > > ip_forward
> > > > ip_input_if
> > > > ipv4_input
> > > > ether_input
> > > > if_input_process
> > > >
> > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that
> > > > imgur link)
> > 
> > I think the problem is recursion within the scheduler. Trace points
> > invoke wakeup() directly, which is unsafe when done from within the
> > run queue routines.
> > 
> > One approach to avoid the recursion is to defer the wakeup() with
> > a soft interrupt. The scheduler runs at an elevated system priority
> > level that blocks soft interrupts. The deferred wakeup() is issued once
> > the system priority level turns low enough.
> > 
> > Unfortunately, the patch will get broken when someone adds trace points
> > to soft interrupt scheduling...
> 
> Thanks for the report.  Sorry for the delay.  I'm now really interested
> in fixing this bug because I'm heavily using btrace(8) to analyse the
> scheduler and I hit this panic a couple of times.
> 
> The problem is that `pnext' might no longer be on the sleepqueue after a
> tracepoint inside setrunnable() fired.  Diff below fixes that by making
> wakeup_n() re-entrant.
> 
> I'm not very interested in committing this diff because it relies on a
> recursive SCHED_LOCK().  Instead I'd prefer to split wakeup_n() in two
> stages: first unlink the threads then call setrunnable().  This approach
> will help us untangle the sleepqueue but needs a bit more shuffling,
> like moving unsleep() out of setrunnable()...
> 
> Claudio, Visa do you agree?
> 
> Index: kern/kern_synch.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.200
> diff -u -p -r1.200 kern_synch.c
> --- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000      1.200
> +++ kern/kern_synch.c 20 Feb 2024 19:51:53 -0000
> @@ -484,7 +484,7 @@ wakeup_proc(struct proc *p, const volati
>                       unsleep(p);
>  #ifdef DIAGNOSTIC
>               else
> -                     panic("wakeup: p_stat is %d", (int)p->p_stat);
> +                     panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
>  #endif
>       }
>  
> @@ -532,21 +532,35 @@ void
>  wakeup_n(const volatile void *ident, int n)
>  {
>       struct slpque *qp;
> -     struct proc *p;
> -     struct proc *pnext;
> +     struct proc *p, piter;
>       int s;
>  
> +     memset(&piter, 0, sizeof(piter));
> +     piter.p_stat = SMARKER;
> +
>       SCHED_LOCK(s);
>       qp = &slpque[LOOKUP(ident)];
> -     for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
> -             pnext = TAILQ_NEXT(p, p_runq);
> +     TAILQ_INSERT_HEAD(qp, &piter, p_runq);
> +     while (n != 0) {
> +             p = TAILQ_NEXT(&piter, p_runq);
> +             if (p == NULL)
> +                     break;
> +
> +             /* Move marker forward */
> +             TAILQ_REMOVE(qp, &piter, p_runq);
> +             TAILQ_INSERT_AFTER(qp, p, &piter, p_runq);
> +
> +             if (p->p_stat == SMARKER)
> +                     continue;
> +
>  #ifdef DIAGNOSTIC
>               if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> -                     panic("wakeup: p_stat is %d", (int)p->p_stat);
> +                     panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
>  #endif
>               if (wakeup_proc(p, ident, 0))
>                       --n;
>       }
> +     TAILQ_REMOVE(qp, &piter, p_runq);
>       SCHED_UNLOCK(s);
>  }
>  
> Index: sys/proc.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/proc.h,v
> retrieving revision 1.356
> diff -u -p -r1.356 proc.h
> --- sys/proc.h        3 Feb 2024 18:51:58 -0000       1.356
> +++ sys/proc.h        20 Feb 2024 19:54:43 -0000
> @@ -414,6 +414,7 @@ struct proc {
>  #define      SZOMB   5               /* unused */
>  #define      SDEAD   6               /* Thread is almost gone */
>  #define      SONPROC 7               /* Thread is currently on a CPU. */
> +#define SMARKER      127             /* Used by markers in reentrant 
> iterator */
>  
>  #define      P_ZOMBIE(p)     ((p)->p_stat == SDEAD)
>  #define      P_HASSIBLING(p) ((p)->p_p->ps_threadcnt > 1)

Ugh, this is too much magic for my little brain.
I would prefer to unqueue the procs onto a new local list and have a
special wakeup_proc for them. The code around wakeup and unsleep will need
some changes to unlock it properly.

-- 
:wq Claudio

Reply via email to