On 25/07/24(Thu) 17:33, Claudio Jeker wrote:
> On Thu, Jul 25, 2024 at 05:15:32PM +0200, Martin Pieuchot wrote:
> > On 25/07/24(Thu) 14:51, Claudio Jeker wrote:
> > > On Thu, Jul 25, 2024 at 11:09:44AM +0200, Martin Pieuchot wrote:
> > > [...] 
> > > > > Index: kern/kern_synch.c
> > > > > ===================================================================
> > > > > RCS file: /cvs/src/sys/kern/kern_synch.c,v
> > > > > diff -u -p -r1.206 kern_synch.c
> > > > > --- kern/kern_synch.c 23 Jul 2024 08:38:02 -0000      1.206
> > > > > +++ kern/kern_synch.c 24 Jul 2024 14:14:06 -0000
> > > > > @@ -399,15 +399,18 @@ sleep_finish(int timo, int do_sleep)
> > > > >        */
> > > > >       if (p->p_wchan == NULL)
> > > > >               do_sleep = 0;
> > > > > +     KASSERT((p->p_flag & P_SINGLESLEEP) == 0);
> > > > >       atomic_clearbits_int(&p->p_flag, P_WSLEEP);
> > > > >  
> > > > > +     /* If requested to stop always force a stop even if do_sleep == 
> > > > > 0 */
> > > > > +     if (p->p_stat == SSTOP)
> > > > > +             do_sleep = 1;
> > > > 
> > > > This is also scary.  The problem with the current scheme is that we 
> > > > don't
> > > > know who changed `p_stat' and if we already did our context switch or
> > > > not. 
> > > 
> > > It is scary indeed. I would prefer if sleep_signal_check() would not
> > > randomly mi_switch() away behind our back. I first tried that but it is
> > > harder then you think.
> > 
> > So maybe let's start by adding:
> > 
> >     KASSERT(!(p->p_stat == SSTOP && do_sleep == 0))
> > 
> > And see if something blows.
> 
> I can assure you single_thread_set() is able to put p->p_stat to SSTOP
> between these lines:
>                 if ((error = sleep_signal_check(p)) != 0) {
>                         catch = 0;
>                         do_sleep = 0;
>                 }
>         }
> 
>         SCHED_LOCK();
> 
> So it can happen but the window is reasonably small (mainly the call to
> cursig() and some minimal other fluff) that the KASSERT will probably
> never hit.

This whole discussion makes me believe that SINGLE_SUSPEND is the
incorrect solution for this and should die.

Instead of trying to change the state of siblings in single_thread_set()
and context switching in single_thread_check() all threads should stop
inside cursig().

I really appreciate all the efforts you've put into debugging this.
However I cannot believe that adding more hacks and checking for p_stat
is the way to go.

Reply via email to