On Mon, Jul 10, 2023 at 05:19:35PM +0200, Mark Kettenis wrote:
> > Date: Mon, 10 Jul 2023 09:57:39 -0500
> > From: Scott Cheloha <scottchel...@gmail.com>
> > 
> > On Mon, Jul 10, 2023 at 07:42:55AM -0600, Theo de Raadt wrote:
> > > I dare you to write the simplest fix for this, instead of a diff that
> > > scrolls by.
> > 
> > This patch seems to work.  Will need to bang on it for a few more days.
> > 
> > 1. Disable gmoninit after sched_stop_scondary_cpus().  The secondary
> >    CPUs have halted, so we aren't racing sysctl(2) on a secondary CPU.
> > 
> > 2. Restore gmoninit between resume_mp() and sched_start_secondary_cpus().
> >    The secondary CPUs are out of cpu_hatch(), which is probably where we
> >    are crashing during resume.  The secondary CPUs haven't started
> >    scheduling yet, so we aren't racing sysctl(2).
> 
> It is still a bit scary to have cpu_hatch() call _mcount() but I guess
> adding __attribute__((no_profile)) to all of the functions called by
> cpu_hatch() isn't really workable either.
> 
> That said...
> 
> > Index: subr_suspend.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/subr_suspend.c,v
> > retrieving revision 1.15
> > diff -u -p -r1.15 subr_suspend.c
> > --- subr_suspend.c  2 Jul 2023 19:02:27 -0000       1.15
> > +++ subr_suspend.c  10 Jul 2023 14:51:01 -0000
> > @@ -18,6 +18,7 @@
> >  
> >  #include <sys/param.h>
> >  #include <sys/systm.h>
> > +#include <sys/atomic.h>
> >  #include <sys/buf.h>
> >  #include <sys/clockintr.h>
> >  #include <sys/reboot.h>
> > @@ -100,6 +101,14 @@ top:
> >  #ifdef MULTIPROCESSOR
> >     sched_stop_secondary_cpus();
> >     KASSERT(CPU_IS_PRIMARY(curcpu()));
> > +#endif
> > +#ifdef GPROF
> > +   extern int gmoninit;
> > +   int gmon_state = gmoninit;
> 
> No variable declarations in the middle of functions please.

Yep, moved up.

> > +   gmoninit = 0;
> > +   membar_producer();
> 
> Why are you messing with memory barriers here?
> 
> > +#endif
> > +#ifdef MULTIPROCESSOR
> >     sleep_mp();
> >  #endif
> >  
> > @@ -172,6 +181,12 @@ fail_suspend:
> >     resume_randomness(rndbuf, rndbuflen);
> >  #ifdef MULTIPROCESSOR
> >     resume_mp();
> > +#endif
> > +#ifdef GPROF
> > +   gmoninit = gmon_state;
> > +   membar_producer();
> 
> And here?

gmoninit is not volatile.  I had this idea that I wanted the store to
be globally visible before proceeding.  But you and Theo are telling
me I don't need the barrier.  I'll take your word for it, dropped.

--

suspend/resume and hibernate/unhibernate still work with the revised
patch below.

Index: subr_suspend.c
===================================================================
RCS file: /cvs/src/sys/kern/subr_suspend.c,v
retrieving revision 1.15
diff -u -p -r1.15 subr_suspend.c
--- subr_suspend.c      2 Jul 2023 19:02:27 -0000       1.15
+++ subr_suspend.c      10 Jul 2023 15:30:22 -0000
@@ -26,6 +26,9 @@
 #include <sys/mount.h>
 #include <sys/syscallargs.h>
 #include <dev/wscons/wsdisplayvar.h>
+#ifdef GPROF
+#include <sys/gmon.h>
+#endif
 #ifdef HIBERNATE
 #include <sys/hibernate.h>
 #endif
@@ -49,6 +52,9 @@ sleep_state(void *v, int sleepmode)
        extern int perflevel;
        size_t rndbuflen;
        char *rndbuf;
+#ifdef GPROF
+       int gmon_state;
+#endif
 #if NSOFTRAID > 0
        extern void sr_quiesce(void);
 #endif
@@ -100,6 +106,12 @@ top:
 #ifdef MULTIPROCESSOR
        sched_stop_secondary_cpus();
        KASSERT(CPU_IS_PRIMARY(curcpu()));
+#endif
+#ifdef GPROF
+       gmon_state = gmoninit;
+       gmoninit = 0;
+#endif
+#ifdef MULTIPROCESSOR
        sleep_mp();
 #endif
 
@@ -172,6 +184,11 @@ fail_suspend:
        resume_randomness(rndbuf, rndbuflen);
 #ifdef MULTIPROCESSOR
        resume_mp();
+#endif
+#ifdef GPROF
+       gmoninit = gmon_state;
+#endif
+#ifdef MULTIPROCESSOR
        sched_start_secondary_cpus();
 #endif
        vfs_stall(curproc, 0);

Reply via email to