On Sat, Sep 07, 2024 at 02:42:21PM +0200, Claudio Jeker wrote:
> On Fri, Sep 06, 2024 at 10:19:56AM +0200, Martin Pieuchot wrote:
> > On 26/07/24(Fri) 08:36, Claudio Jeker wrote:
> > > On Thu, Jul 25, 2024 at 08:20:32PM +0200, Martin Pieuchot wrote:
> > > > On 25/07/24(Thu) 17:33, Claudio Jeker wrote:
> > > > > On Thu, Jul 25, 2024 at 05:15:32PM +0200, Martin Pieuchot wrote:
> > > > > > On 25/07/24(Thu) 14:51, Claudio Jeker wrote:
> > > > > > > On Thu, Jul 25, 2024 at 11:09:44AM +0200, Martin Pieuchot wrote:
> > > > > > > [...] 
> > > > > > > > > Index: kern/kern_synch.c
> > > > > > > > > ===================================================================
> > > > > > > > > RCS file: /cvs/src/sys/kern/kern_synch.c,v
> > > > > > > > > diff -u -p -r1.206 kern_synch.c
> > > > > > > > > --- kern/kern_synch.c 23 Jul 2024 08:38:02 -0000      1.206
> > > > > > > > > +++ kern/kern_synch.c 24 Jul 2024 14:14:06 -0000
> > > > > > > > > @@ -399,15 +399,18 @@ sleep_finish(int timo, int do_sleep)
> > > > > > > > >        */
> > > > > > > > >       if (p->p_wchan == NULL)
> > > > > > > > >               do_sleep = 0;
> > > > > > > > > +     KASSERT((p->p_flag & P_SINGLESLEEP) == 0);
> > > > > > > > >       atomic_clearbits_int(&p->p_flag, P_WSLEEP);
> > > > > > > > >  
> > > > > > > > > +     /* If requested to stop always force a stop even if 
> > > > > > > > > do_sleep == 0 */
> > > > > > > > > +     if (p->p_stat == SSTOP)
> > > > > > > > > +             do_sleep = 1;
> > > > > > > > 
> > > > > > > > This is also scary.  The problem with the current scheme is 
> > > > > > > > that we don't
> > > > > > > > know who changed `p_stat' and if we already did our context 
> > > > > > > > switch or
> > > > > > > > not. 
> > > > > > > 
> > > > > > > It is scary indeed. I would prefer if sleep_signal_check() would 
> > > > > > > not
> > > > > > > randomly mi_switch() away behind our back. I first tried that but 
> > > > > > > it is
> > > > > > > harder then you think.
> > > > > > 
> > > > > > So maybe let's start by adding:
> > > > > > 
> > > > > >     KASSERT(!(p->p_stat == SSTOP && do_sleep == 0))
> > > > > > 
> > > > > > And see if something blows.
> > > > > 
> > > > > I can assure you single_thread_set() is able to put p->p_stat to SSTOP
> > > > > between these lines:
> > > > >                 if ((error = sleep_signal_check(p)) != 0) {
> > > > >                         catch = 0;
> > > > >                         do_sleep = 0;
> > > > >                 }
> > > > >         }
> > > > > 
> > > > >         SCHED_LOCK();
> > > > > 
> > > > > So it can happen but the window is reasonably small (mainly the call 
> > > > > to
> > > > > cursig() and some minimal other fluff) that the KASSERT will probably
> > > > > never hit.
> > > > 
> > > > This whole discussion makes me believe that SINGLE_SUSPEND is the
> > > > incorrect solution for this and should die.
> > > > 
> > > > Instead of trying to change the state of siblings in single_thread_set()
> > > > and context switching in single_thread_check() all threads should stop
> > > > inside cursig().
> > > > 
> > > > I really appreciate all the efforts you've put into debugging this.
> > > > However I cannot believe that adding more hacks and checking for p_stat
> > > > is the way to go.
> > > 
> > > It is not a hack. There will always be something like this unless you want
> > > to revert back to a giant recursive scheduler lock that can block the full
> > > sleep machinery from start to finish and accept the fact that OpenBSD will
> > > never scale to more than 8CPUs.
> > > 
> > > You have to accept that putting a thread to sleep is dirty business and
> > > that sleep finish must check that the transaction was actually successful.
> > > 
> > > I tried various approaches to fix this problem. All of them where
> > > horrible. I spent a week on this bug and I would prefer to spend my time
> > > on actually working on fixing SIGSTOP, dowait6() and finally replace the
> > > sched_lock in sleep. All of this will happen in and around the same place
> > > and maybe once we're done my "hack" is gone as well.
> > > 
> > > I will cycle back into this mess at some point but I will not rewrite the
> > > ptrace interaction now. I'm happy to drop this and just accept the fact
> > > that debugging multithreaded applications on OpenBSD is simply broken.
> > 
> > I believe you should commit this before release.  Having functional MT
> > debugging is more than valuable.  So ok mpi@ for the diff.
> > 
> > That said, I'm still concerned about the growing number of per-thread
> > flags used by the sleep_setup/sleep_finish() machinery to work around
> > under-defined SSLEEP & SSTOP transitions:
> > 
> > - P_WSLEEP to work around setting SSLEEP to early
> > - P_SINTR to indicate that SSLEEP isn't enough
> > - P_SUSPSINGLE to indicate that a thread is parked and not SSTOP
> > - P_SUSPSIG to indicate which thread handled the STOP signal
> > - P_TIMEOUT to indicate the sleep has a timer
> > - P_SINGLESLEEP to work around a race single_thread & signal APIs.
> > 
> > I believe we can and should do better in the future.
> > 
> 
> Lets split this up into small pieces.
> First lets add the P_TRACESINGLE bits.

This is the 2nd part which adds the P_SINGLESLEEP bits.

The 3rd diff will be the ps_xsig fixes for signal delivery via ptrace.
After that I think ptrace should be fit for release
-- 
:wq Claudio

? x2diff
Index: kern/kern_sig.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sig.c,v
diff -u -p -r1.338 kern_sig.c
--- kern/kern_sig.c     10 Aug 2024 09:18:09 -0000      1.338
+++ kern/kern_sig.c     7 Sep 2024 13:10:28 -0000
@@ -2017,6 +2017,9 @@ single_thread_check_locked(struct proc *
 
        MUTEX_ASSERT_LOCKED(&pr->ps_mtx);
 
+       if (deep)
+               atomic_clearbits_int(&p->p_flag, P_SINGLESLEEP);
+
        if (pr->ps_single == NULL || pr->ps_single == p)
                return (0);
 
@@ -2126,8 +2129,14 @@ single_thread_set(struct proc *p, int fl
                        if (q->p_flag & P_SINTR) {
                                /* merely need to suspend?  just stop it */
                                if (mode == SINGLE_SUSPEND) {
+                                       /*
+                                        * if between sleep_setup and
+                                        * sleep_signal_check don't count us
+                                        * out.
+                                        */
+                                       if ((q->p_flag & P_SINGLESLEEP) == 0)
+                                               --pr->ps_singlecnt;
                                        q->p_stat = SSTOP;
-                                       --pr->ps_singlecnt;
                                        break;
                                }
                                /* need to unwind or exit, so wake it */
Index: kern/kern_synch.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_synch.c,v
diff -u -p -r1.206 kern_synch.c
--- kern/kern_synch.c   23 Jul 2024 08:38:02 -0000      1.206
+++ kern/kern_synch.c   7 Sep 2024 13:10:28 -0000
@@ -356,7 +356,7 @@ sleep_setup(const volatile void *ident, 
        atomic_setbits_int(&p->p_flag, P_WSLEEP);
        TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
        if (prio & PCATCH)
-               atomic_setbits_int(&p->p_flag, P_SINTR);
+               atomic_setbits_int(&p->p_flag, P_SINTR | P_SINGLESLEEP);
        p->p_stat = SSLEEP;
 
        SCHED_UNLOCK();
@@ -399,6 +399,7 @@ sleep_finish(int timo, int do_sleep)
         */
        if (p->p_wchan == NULL)
                do_sleep = 0;
+       KASSERT((p->p_flag & P_SINGLESLEEP) == 0);
        atomic_clearbits_int(&p->p_flag, P_WSLEEP);
 
        if (do_sleep) {
Index: sys/proc.h
===================================================================
RCS file: /cvs/src/sys/sys/proc.h,v
diff -u -p -r1.371 proc.h
--- sys/proc.h  1 Sep 2024 03:09:00 -0000       1.371
+++ sys/proc.h  7 Sep 2024 13:10:28 -0000
@@ -433,6 +433,7 @@ struct proc {
 #define        P_SIGSUSPEND    0x00000008      /* Need to restore 
before-suspend mask*/
 #define        P_CANTSLEEP     0x00000010      /* insomniac thread */
 #define        P_WSLEEP        0x00000020      /* Working on going to sleep. */
+#define        P_SINGLESLEEP   0x00000040      /* Like P_WSLEEP for single 
thread api */
 #define        P_SINTR         0x00000080      /* Sleep is interruptible. */
 #define        P_SYSTEM        0x00000200      /* No sigs, stats or swapping. 
*/
 #define        P_TIMEOUT       0x00000400      /* Timing out during sleep. */
@@ -445,9 +446,9 @@ struct proc {
 
 #define        P_BITS \
     ("\20" "\01INKTR" "\02PROFPEND" "\03ALRMPEND" "\04SIGSUSPEND" \
-     "\05CANTSLEEP" "\06WSLEEP" "\010SINTR" "\012SYSTEM" "\013TIMEOUT" \
-     "\016WEXIT" "\020OWEUPC" "\024SUSPSINGLE" "\033THREAD" \
-     "\034SUSPSIG" "\037CPUPEG")
+     "\05CANTSLEEP" "\06WSLEEP" "\07SINGLESLEEP" "\010SINTR" "\012SYSTEM" \
+     "\013TIMEOUT" "\016WEXIT" "\020OWEUPC" "\024SUSPSINGLE" \
+     "\033THREAD" "\034SUSPSIG" "\037CPUPEG")
 
 #define        THREAD_PID_OFFSET       100000
 

Reply via email to