On Sun, Jul 09, 2023 at 08:11:43PM +0200, Claudio Jeker wrote: > On Sun, Jul 09, 2023 at 12:52:20PM -0500, Scott Cheloha wrote: > > This patch fixes resume/unhibernate on GPROF kernels where kgmon(8) > > has activated kernel profiling. > > > > I think the problem is that code called from cpu_hatch() does not play > > nicely with _mcount(), so GPROF kernels crash during resume. I can't > > point you to which code in particular, but keeping all CPUs out of > > _mcount() until the primary CPU has completed resume/unhibernate fixes > > the crash. > > > > ok? > > To be honest, I'm not sure we need something like this. GPROF is already a > special case and poeple running a GPROF kernel should probably stop the > collection of profile data before suspend/hibernate.
Sorry, I was a little unclear in my original mail. When I say "has activated kernel profiling" I mean "has *ever* activated kernel profiling". Regardless of whether or not profiling is active at the moment we reach sleep_state(), if kernel profiling has *ever* been activated in the past, the resume crashes. When sysctl(2) reaches sysctl_doprof(), gmoninit is set. "Assume that if we're here it is safe to execute profiling": 166 /* 167 * Return kernel profiling information. 168 */ 169 int 170 sysctl_doprof(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, 171 size_t newlen) 172 { 173 CPU_INFO_ITERATOR cii; 174 struct cpu_info *ci; 175 struct gmonparam *gp = NULL; 176 int error, cpuid, op, state; 177 178 /* all sysctl names at this level are name and field */ 179 if (namelen != 2) 180 return (ENOTDIR); /* overloaded */ 181 182 op = name[0]; 183 cpuid = name[1]; 184 185 CPU_INFO_FOREACH(cii, ci) { 186 if (cpuid == CPU_INFO_UNIT(ci)) { 187 gp = ci->ci_gmon; 188 break; 189 } 190 } 191 192 if (gp == NULL) 193 return (EOPNOTSUPP); 194 195 /* Assume that if we're here it is safe to execute profiling. */ 196 gmoninit = 1; After that first sysctl(2), all CPUs will stop bouncing out of _mcount() at the gmoninit check and starting checking per-CPU data structures to decide whether or not to record the arc. "Do not profile execution...". 73 _MCOUNT_DECL(u_long frompc, u_long selfpc) __used; 74 /* _mcount; may be static, inline, etc */ 75 _MCOUNT_DECL(u_long frompc, u_long selfpc) 76 { 77 u_short *frompcindex; 78 struct tostruct *top, *prevtop; 79 struct gmonparam *p; 80 long toindex; 81 #ifdef _KERNEL 82 int s; 83 84 /* 85 * Do not profile execution if memory for the current CPU 86 * descriptor and profiling buffers has not yet been allocated 87 * or if the CPU we are running on has not yet set its trap 88 * handler. 89 */ 90 if (gmoninit == 0) 91 return; 92 #ifdef SUSPEND 93 if (mcount_disabled) 94 return; 95 #endif 96 if ((p = curcpu()->ci_gmon) == NULL) 97 return; 98 #else 99 p = &_gmonparam; 100 #endif 101 /* 102 * check that we are profiling 103 * and that we aren't recursively invoked. 104 */ 105 if (p->state != GMON_PROF_ON) 106 return; 107 #ifdef _KERNEL 108 MCOUNT_ENTER; This patch adds a second check to _mcount(), mcount_disabled, which has a distinct meaning. gmoninit means: The boot initialization is now done and it is safe to touch the per-CPU GPROF data structures. mcount_disabled says: Keep out. There may be a clever way to merge the two variables, but the simplest thing I could think of was to just add a second boolean. The current patch is idiot-proof. > Unless someone wants to gprof suspend/hibernate but then doing this will > result in bad data. That's way out of scope, I'm not advocating for that. If you want to profile sleep_state() you need to do it some other way. > Another option is to abort zzz/ZZZ if kernel profiling is running. I don't think this would be a good user experience. Performing the suspend without question and possibly returning bad profiling results is better than failing the suspend. IMHO, if suspend/resume interferes with kgmon(8) yielding accurate results we just need to document it in kgmon.8. I would argue that suspend/resume is an edge case one needs to keep in mind. You can still profile large parts of the kernel's execution path. > In your diff I would remove these asserts: > KASSERT(CPU_IS_PRIMARY(curcpu())); > The code does not require the primary cpu there and KASSERT are not for free. Sure, dropped. Index: sys/lib/libkern/mcount.c =================================================================== RCS file: /cvs/src/sys/lib/libkern/mcount.c,v retrieving revision 1.14 diff -u -p -r1.14 mcount.c --- sys/lib/libkern/mcount.c 11 Jan 2022 09:21:34 -0000 1.14 +++ sys/lib/libkern/mcount.c 9 Jul 2023 22:03:22 -0000 @@ -33,6 +33,28 @@ #include <sys/param.h> #include <sys/gmon.h> +#ifdef _KERNEL +#ifdef SUSPEND +#include <sys/atomic.h> + +volatile int mcount_disabled; + +void +mcount_disable(void) +{ + mcount_disabled = 1; + membar_producer(); +} + +void +mcount_enable(void) +{ + mcount_disabled = 0; + membar_producer(); +} +#endif /* SUSPEND */ +#endif /* _KERNEL */ + /* * mcount is called on entry to each function compiled with the profiling * switch set. _mcount(), which is declared in a machine-dependent way @@ -63,7 +85,10 @@ _MCOUNT_DECL(u_long frompc, u_long selfp */ if (gmoninit == 0) return; - +#ifdef SUSPEND + if (mcount_disabled) + return; +#endif if ((p = curcpu()->ci_gmon) == NULL) return; #else Index: sys/kern/subr_suspend.c =================================================================== RCS file: /cvs/src/sys/kern/subr_suspend.c,v retrieving revision 1.15 diff -u -p -r1.15 subr_suspend.c --- sys/kern/subr_suspend.c 2 Jul 2023 19:02:27 -0000 1.15 +++ sys/kern/subr_suspend.c 9 Jul 2023 22:03:22 -0000 @@ -26,6 +26,9 @@ #include <sys/mount.h> #include <sys/syscallargs.h> #include <dev/wscons/wsdisplayvar.h> +#ifdef GPROF +#include <sys/gmon.h> +#endif #ifdef HIBERNATE #include <sys/hibernate.h> #endif @@ -63,6 +66,9 @@ top: if (sleep_showstate(v, sleepmode)) return EOPNOTSUPP; +#ifdef GPROF + mcount_disable(); +#endif #if NWSDISPLAY > 0 wsdisplay_suspend(); #endif @@ -192,6 +198,9 @@ fail_hiballoc: start_periodic_resettodr(); #if NWSDISPLAY > 0 wsdisplay_resume(); +#endif +#ifdef GPROF + mcount_enable(); #endif sys_sync(curproc, NULL, NULL); if (cpu_setperf != NULL) Index: sys/sys/gmon.h =================================================================== RCS file: /cvs/src/sys/sys/gmon.h,v retrieving revision 1.9 diff -u -p -r1.9 gmon.h --- sys/sys/gmon.h 11 Jan 2022 23:59:55 -0000 1.9 +++ sys/sys/gmon.h 9 Jul 2023 22:03:22 -0000 @@ -158,6 +158,10 @@ struct gmonparam { #ifdef _KERNEL extern int gmoninit; /* Is the kernel ready for being profiled? */ +#ifdef SUSPEND +void mcount_disable(void); +void mcount_enable(void); +#endif #else /* !_KERNEL */ #include <sys/cdefs.h> Index: lib/libc/gmon/mcount.c =================================================================== RCS file: /cvs/src/lib/libc/gmon/mcount.c,v retrieving revision 1.16 diff -u -p -r1.16 mcount.c --- lib/libc/gmon/mcount.c 11 Jan 2022 09:21:34 -0000 1.16 +++ lib/libc/gmon/mcount.c 9 Jul 2023 22:03:23 -0000 @@ -31,6 +31,28 @@ #include <sys/types.h> #include <sys/gmon.h> +#ifdef _KERNEL +#ifdef SUSPEND +#include <sys/atomic.h> + +volatile int mcount_disabled; + +void +mcount_disable(void) +{ + mcount_disabled = 1; + membar_producer(); +} + +void +mcount_enable(void) +{ + mcount_disabled = 0; + membar_producer(); +} +#endif /* SUSPEND */ +#endif /* _KERNEL */ + /* * mcount is called on entry to each function compiled with the profiling * switch set. _mcount(), which is declared in a machine-dependent way @@ -61,7 +83,10 @@ _MCOUNT_DECL(u_long frompc, u_long selfp */ if (gmoninit == 0) return; - +#ifdef SUSPEND + if (mcount_disabled) + return; +#endif if ((p = curcpu()->ci_gmon) == NULL) return; #else