On Tue, 2010-07-06 at 17:40 -0700, Bruce Evans wrote: > On Tue, 6 Jul 2010, Sean Bruno wrote: > > > On Thu, 2010-07-01 at 16:46 -0700, Sean Bruno wrote: > >> Found this lying around the Yahoo tree this week. Basically it allows > >> you to activate, reset and deactivate profiling with the '-f" flag. > > I want something like this, but this implementation has too many style > bugs and obscure or missing locking. > > >> Kind of nice to have if you want the ability to turn on profiling for > >> debugging live systems. > > We already have this via `kgmon -hBb', but this patch implements > something different. Normal profiling requires configuring the kernel > with "config -p[p]", but that adds a lot of overhead, even when profiling > is not turned on. The patch implements dynamic configuration of flat > profiling only (since dynamic call graphs are too hard to implement, > but this restriction is not mentioned anywhere in the patch or the > accompanying mail. > > Userland profiling has the same lossage -- you have to configure > programs specially by compiling with -pg, but that adds a lot of > overhead, even when profiling is not turned on. Moreover, in userland > there is no library support for turning profiling on and off or for > dumping and clearing the statistics except on program start and > completion, so the much larger overhead of always maintaining the call > graph is normally always present. > > In FreeBSD-1, I (ab)used a bug in the implementation to turn on flat > profiling (only) in the kernel. This avoids the large call graph > overhead (it still costs a function call and some other branches on > every call and maybe on every return, but the branches for this are > predictable so the overhead is not very large. This should be made > a standard feature, and perhaps this patch implements it as a bug > without really trying, as in FreeBSD-1 (implementing it just takes > setting gp->state to a value that gives flat profiling while keeping > mcount() null if full profiling is configured. > > % Index: usr.sbin/kgmon/kgmon.8 > % =================================================================== > % --- usr.sbin/kgmon/kgmon.8 (revision 209745) > % +++ usr.sbin/kgmon/kgmon.8 (working copy) > % @@ -70,6 +70,9 @@ > % Dump the contents of the profile buffers into a > % .Pa gmon.out > % file. > % +.It Fl f > % +Free profiling buffer. > % +Also stops profiling. > % .It Fl r > % Reset all the profile buffers. > % If the > > Freeing the profiling buffer alone isn't very useful. The memory wastage > from always allocating it at boot time and never freeing it would be > rarely noticed, and would fix some races or simplify avoidance of races. > However, apart from the race problems, ordinary statically configured > profiling should free things too, since unlike in userland it is normal > to have profiling turned off most of the time even when it is statically > configured. > > The above doesn't really describe what -f does. In normal profiling, > there are multiple profiling buffers and freeing just the flat profiling > one makes no sense. -f in fact frees the profiling buffer only if the > kernel is _not_ configured for profiling (as is required to not corrupt > the set of profiling buffers if the kernel is configured for profiling). > The profiling buffer is automatically allocated on first use iff it > is not already allocated, and of course -f has no effect if no buffer > is allocated. Perhaps -r should automatically deallocate, so -f would > not be needed. Kernel profiling has this feature of allowing multiple > dumps of the buffer(s) before reset, so reset is a good time to deallocate. > > Style bugs in the above: f is unsorted between p and r. > > % Index: usr.sbin/kgmon/kgmon.c > % =================================================================== > % --- usr.sbin/kgmon/kgmon.c (revision 209745) > % +++ usr.sbin/kgmon/kgmon.c (working copy) > % @@ -72,7 +72,7 @@ > % struct gmonparam gpm; > % }; > % > % -int Bflag, bflag, hflag, kflag, rflag, pflag; > % +int Bflag, bflag, hflag, kflag, rflag, pflag, fflag; > > Style bugs: now f is unsorted after r and p. p was already unsorted after r. > > % int debug = 0; > % int getprof(struct kvmvars *); > % int getprofhz(struct kvmvars *); > % @@ -82,6 +82,7 @@ > % void dumpstate(struct kvmvars *kvp); > % void reset(struct kvmvars *kvp); > % static void usage(void); > % +void freebuf(struct kvmvars *kvp); > > Style bugs: f is unsorted after u; all the old declarations are indented > with 1 tab while the new one is indented with spaces. > > % > % int > % main(int argc, char **argv) > % @@ -93,7 +94,7 @@ > % seteuid(getuid()); > % kmemf = NULL; > % system = NULL; > % - while ((ch = getopt(argc, argv, "M:N:Bbhpr")) != -1) { > % + while ((ch = getopt(argc, argv, "M:N:Bbfhpr")) != -1) { > % switch((char)ch) { > % > % case 'M': > % @@ -113,6 +114,10 @@ > % bflag = 1; > % break; > % > % + case 'f': > % + fflag = 1; > % + break; > % + > % case 'h': > % hflag = 1; > % break; > > Now f is correctly sorted in the above 2 places. But it is missing from > the usage message. > > % @@ -158,6 +163,10 @@ > % dumpstate(&kvmvars); > % if (rflag) > % reset(&kvmvars); > % + if (fflag) { > % + freebuf(&kvmvars); > % + disp = GMON_PROF_OFF; > % + } > % if (accessmode == O_RDWR) > % setprof(&kvmvars, disp); > % (void)fprintf(stdout, "kgmon: kernel profiling is %s.\n", > % @@ -403,36 +412,44 @@ > % /* > % * Write out the arc info. > % */ > % - if ((froms = (u_short *)malloc(kvp->gpm.fromssize)) == NULL) > % - errx(8, "cannot allocate froms space"); > % - if (kflag) { > % - i = kvm_read(kvp->kd, (u_long)kvp->gpm.froms, (void *)froms, > % - kvp->gpm.fromssize); > % - } else { > % - mib[2] = GPROF_FROMS; > % - i = kvp->gpm.fromssize; > % - if (sysctl(mib, 3, froms, &i, NULL, 0) < 0) > % - i = 0; > % - } > % - if (i != kvp->gpm.fromssize) > % - errx(9, "read froms: read %lu, got %ld: %s", > % - kvp->gpm.fromssize, (long)i, > % - kflag ? kvm_geterr(kvp->kd) : strerror(errno)); > % - if ((tos = (struct tostruct *)malloc(kvp->gpm.tossize)) == NULL) > % - errx(10, "cannot allocate tos space"); > % - if (kflag) { > % - i = kvm_read(kvp->kd, (u_long)kvp->gpm.tos, (void *)tos, > % - kvp->gpm.tossize); > % - } else { > % - mib[2] = GPROF_TOS; > % - i = kvp->gpm.tossize; > % - if (sysctl(mib, 3, tos, &i, NULL, 0) < 0) > % - i = 0; > % - } > % - if (i != kvp->gpm.tossize) > % - errx(11, "read tos: read %lu, got %ld: %s", > % - kvp->gpm.tossize, (long)i, > % - kflag ? kvm_geterr(kvp->kd) : strerror(errno)); > % + if (kvp->gpm.fromssize > 0) { > % + if ((froms = (u_short *)malloc(kvp->gpm.fromssize)) == > NULL) > % + errx(8, "cannot allocate froms space"); > % + if (kflag) { > % + i = kvm_read(kvp->kd, (u_long)kvp->gpm.froms, > % + (void *)froms, > % + kvp->gpm.fromssize); > % + } else { > % + mib[2] = GPROF_FROMS; > % + i = kvp->gpm.fromssize; > % + if (sysctl(mib, 3, froms, &i, NULL, 0) < 0) > % + i = 0; > % + } > % + if (i != kvp->gpm.fromssize) > % + errx(9, "read froms: read %lu, got %ld: %s", > % + kvp->gpm.fromssize, (long)i, > % + kflag ? kvm_geterr(kvp->kd) : > strerror(errno)); > % + } > % + if (kvp->gpm.tossize > 0) { > % + if ((tos = (struct tostruct *)malloc(kvp->gpm.tossize)) == > % + NULL) > % + errx(10, "cannot allocate tos space"); > % + if (kflag) { > % + i = kvm_read(kvp->kd, (u_long)kvp->gpm.tos, > % + (void *)tos, > % + kvp->gpm.tossize); > % + } else { > % + mib[2] = GPROF_TOS; > % + i = kvp->gpm.tossize; > % + if (sysctl(mib, 3, tos, &i, NULL, 0) < 0) > % + i = 0; > % + } > % + if (i != kvp->gpm.tossize) > % + errx(11, "read tos: read %lu, got %ld: %s", > % + kvp->gpm.tossize, (long)i, > % + kflag ? kvm_geterr(kvp->kd) : > strerror(errno)); > % + } > % + > > The above is unreadable due to extra indentation and/or massive tab lossage. > Most but not all tabs are corrupted to spaces. > > I think it requires the extra indentation as written, since `to' and `from' > sizes of 0 now become non-errors, so they are skipped over by putting them > in a conditional clause instead of aborted on. These cases would also > happen with full profiling configured but only flat profiling enabled. I'm > not sure how my FreeBSD-1 hack handled this. > > % if (debug) > % warnx("lowpc 0x%lx, textsize 0x%lx", > % (unsigned long)kvp->gpm.lowpc, kvp->gpm.textsize); > % @@ -509,11 +526,13 @@ > % if (kvm_write(kvp->kd, (u_long)kvp->gpm.kcount, zbuf, > % kvp->gpm.kcountsize) != kvp->gpm.kcountsize) > % errx(13, "tickbuf zero: %s", kvm_geterr(kvp->kd)); > % - if (kvm_write(kvp->kd, (u_long)kvp->gpm.froms, zbuf, > % - kvp->gpm.fromssize) != kvp->gpm.fromssize) > % + if (kvp->gpm.fromssize > 0 && > % + kvm_write(kvp->kd, (u_long)kvp->gpm.froms, zbuf, > % + kvp->gpm.fromssize) != kvp->gpm.fromssize) > % errx(14, "froms zero: %s", kvm_geterr(kvp->kd)); > % - if (kvm_write(kvp->kd, (u_long)kvp->gpm.tos, zbuf, > % - kvp->gpm.tossize) != kvp->gpm.tossize) > % + if (kvp->gpm.tossize > 0 && > % + kvm_write(kvp->kd, (u_long)kvp->gpm.tos, zbuf, > % + kvp->gpm.tossize) != kvp->gpm.tossize) > % errx(15, "tos zero: %s", kvm_geterr(kvp->kd)); > % return; > % } > > Now the indentation is not extra but tab lossage is still massive. > > % @@ -524,11 +543,33 @@ > % if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.kcountsize) < 0) > % err(13, "tickbuf zero"); > % mib[2] = GPROF_FROMS; > % - if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.fromssize) < 0) > % + if (kvp->gpm.fromssize > 0 && > % + sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.fromssize) < 0) > % err(14, "froms zero"); > % mib[2] = GPROF_TOS; > % - if (sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.tossize) < 0) > % - err(15, "tos zero"); > % + if (kvp->gpm.tossize > 0 && > % + sysctl(mib, 3, NULL, NULL, zbuf, kvp->gpm.tossize) < 0) > > Tab lossage on every changed line, as usual. > > % (void)seteuid(getuid()); > % free(zbuf); > % } > % + > % +/* > % + * Free the kernel profiling date structures. > % + */ > % +void > % +freebuf(kvp) > % + struct kvmvars *kvp; > % +{ > % + int mib[3]; > % + > % + setprof(kvp, GMON_PROF_OFF); > % + if (kflag) > % + return; > % + (void)seteuid(0); > % + mib[0] = CTL_KERN; > % + mib[1] = KERN_PROF; > % + mib[2] = GPROF_FREEBUF; > % + if (sysctl(mib, 3, NULL, 0, NULL, 0) < 0) > % + err(16, "Freeing profiling buffer failed"); > % + (void)seteuid(getuid()); > % +} > > No tab in sight in new code. > > Error message are not capitalized in KNF. This rule was followed in all > the old error messages in this program. > > % Index: sys/kern/kern_clock.c > % =================================================================== > % --- sys/kern/kern_clock.c (revision 209745) > % +++ sys/kern/kern_clock.c (working copy) > % @@ -68,9 +68,7 @@ > % #include <sys/limits.h> > % #include <sys/timetc.h> > % > % -#ifdef GPROF > % #include <sys/gmon.h> > % -#endif > % > % #ifdef HWPMC_HOOKS > % #include <sys/pmckern.h> > % @@ -714,10 +712,8 @@ > % profclock(int usermode, uintfptr_t pc) > % { > % struct thread *td; > % -#ifdef GPROF > % struct gmonparam *g; > % uintfptr_t i; > % -#endif > % > % td = curthread; > % if (usermode) { > % @@ -730,7 +726,6 @@ > % if (td->td_proc->p_flag & P_PROFIL) > % addupc_intr(td, pc, 1); > % } > % -#ifdef GPROF > % else { > % /* > % * Kernel statistics are just like addupc_intr, only easier. > % @@ -743,7 +738,6 @@ > % } > % } > % } > % -#endif > % } > % > % /* > > Seems to be correct. Everything except initialization is here, and the > changes for it are remarkably simple and low overhead. > > % Index: sys/kern/subr_prof.c > % =================================================================== > % --- sys/kern/subr_prof.c (revision 209745) > % +++ sys/kern/subr_prof.c (working copy) > % @@ -44,7 +44,6 @@ > % > % #include <machine/cpu.h> > % > % -#ifdef GPROF > % #include <sys/malloc.h> > % #include <sys/gmon.h> > % #undef MCOUNT > % @@ -136,7 +135,46 @@ > % free(cp, M_GPROF); > % } > % > % +#ifndef GPROF > % +/* Allocate kcount buffer and initialize state */ > > Style bugs: > - missing "." > - no description (except the ifdef) that this only applies to the unusual > case of non-static configuration. > > % +static int > % +prof_alloc_kcountbuf(void) > > The prof_ namespace is not used by profiling code. > > % +{ > % + char *cp; > % + struct gmonparam *p = &_gmonparam; > % + int error = 0; > > Style bug: initialization in declaration. Old profiling code has this > for p but not for errno. > > % + > % + mtx_lock(&Giant); > > There are lots of race problems in old and new profiling initialization > and reinitialization code, and this has little effect on them. Higher > level profiling code in kern_clock.c is mostly locked by time_lock, > and the parts that use time_lock are correct AFAIK. However, this > locking is not used by profclock(). profclock() used to use sched_lock, > but now uses no explicit locking whatsoever. It is just called from > the low-level timer interrupt handler, and that is required to be a > fast interrupt handler so that it has interrupts disabled on the current > CPU, and there is the pseudo-locking that profclock() is only called > on 1 CPU at intervals of 1/profhz, so the calls shouldn't overlap. > time_lock is a spinlock, so acquiring it locks out profclock() due to > the bugfeature that spinlocks lock out all fast interrupt handlers. > > % + if (p->kcount == NULL) { > % + cp = (char *)malloc(p->kcountsize, M_GPROF, M_NOWAIT); > > Style bug: useless cast copied from old code. This is not C++. > > The locking shouldn't be to use a spinlock, since calling malloc() while > holding a spinlock is or should be a panic. For more ideas on correct > locking, see below. > > % + if (cp == 0) { > % + printf("No memory for profiling.\n"); > > Style bugs: > - null pointer constants are not spelled "0" in the kernel > - kernel error messages are not capitalized > - kernel error messages are not terminated by a ".". > > % + error = ENOMEM; > % + goto out; > % + } > % + bzero(cp, p->kcountsize); > % + p->kcount = (HISTCOUNTER *)cp; > > More excessive casting, consistent with old code (*blush*). > > % + } > % +out: > % + mtx_unlock(&Giant); > % + return error; > > Style bug: missing parentheses. > > % +} > > Locking is not so critical for allocation. You can simply rely on the > profiling buffer not being used while it is an in-between state for the > same reason that it wasn't used before it existed at all -- that the > state variable prevents its use. You just need locking to prevent the > state variable changing underneath you. Locking in the above is neither > necessary nor sufficient for this, no matter what it is (without further > changes to access the state variable in the above). No locking at all > is required in the above. > > % + > % static void > % +prof_free_kcountbuf(void) > % +{ > % + struct gmonparam *p = &_gmonparam; > % + > % + mtx_lock(&Giant); > % + if (p->kcount != NULL) { > % + free(p->kcount, M_GPROF); > % + p->kcount = (HISTCOUNTER *)NULL; > % + } > % + mtx_unlock(&Giant); > % +} > > As above for locking and casts, etc. Now it is critical to change the > state variable to "off" before calling the above (this is done) and to > keep it off (this is not done). > > % +#endif > > The above has massive tab lossage, as usual. > > Old locking bugs: > - note that the boot-time initialization (kmstartup()) uses no locking > except for a couple of critical_enter()s, and that these are broken. > This depends on the state being "off" initially and nothing external > turning it on, which still works as intended AFAIK.. The critical_exit()s > were correct when I wrote them initially as disable_intr()'s. They > are supposed to prevent _all_ interrupts so as to get accurate and > no interference from interrupt routines. critical_enter() only stops > preemption -- it allows all fast interrupt handlers and all low-level > code for normal interrupts to run, so using it here is very broken. > Even if the initialization is so early as to prevent any interrupts, > this is unclear. > > - dynamic reinitialization to support modules (kmupetext()) uses Giant > and critical_enter() locking. This depends on things that aren't > true. Giant locking has no effect on profiling (though it may have > had a small effect when it was committed, since IIRC the places that > use time_lock used to use Giant locking; anyway, this effect is almost > useless here). It used to use disable_intr() instead of critical_enter(). > This originally worked for UP systems, but for SMP systems it didn't > lock out profclock(). Now, critical_enter() doesn't even lock out > profclock() on UP systems, so changing to using it is even more broken > here. > > It is necessary to set the state variable to "off" and keep it off, > similarly to this change. > > kmupetext() also uses M_WAITOK malloc()s. This is simpler and > probably good enough, but using M_NOWAIT as in this patch is safer. > Now I wonder what is the size of the largest M_WAITOK malloc() that > can be done by a privileged user. If it is larger than the flat > profiling buffer size in this patch then there is no point in not > waiting in this patch. > > % + > % +static void > % kmstartup(dummy) > % void *dummy; > % { > % @@ -152,6 +190,7 @@ > % int nullfunc_loop_profiled_time; > % uintfptr_t tmp_addr; > % #endif > % + int bufsize; > % > % /* > % * Round lowpc and highpc to multiples of the density we're using > % @@ -164,6 +203,7 @@ > % p->textsize, (uintmax_t)p->lowpc, (uintmax_t)p->highpc); > % p->kcountsize = p->textsize / HISTFRACTION; > % p->hashfraction = HASHFRACTION; > % +#ifdef GPROF > % p->fromssize = p->textsize / HASHFRACTION; > % p->tolimit = p->textsize * ARCDENSITY / 100; > % if (p->tolimit < MINARCS) > % @@ -171,13 +211,30 @@ > % else if (p->tolimit > MAXARCS) > % p->tolimit = MAXARCS; > % p->tossize = p->tolimit * sizeof(struct tostruct); > % - cp = (char *)malloc(p->kcountsize + p->fromssize + p->tossize, > % - M_GPROF, M_WAITOK | M_ZERO); > % + bufsize = p->kcountsize + p->fromssize + p->tossize; > > Hmm, no tab lossage in small sections. > > % +#else > % + p->fromssize = 0; > % + p->tolimit = 0; > % + p->tossize = 0; > % + bufsize = 0; > % + p->kcount = (HISTCOUNTER *)NULL; > > Massive tab lossage resumes. > > I think p is statically initialized to 0 and thus all of the code in thos > `#else' clause is dead. > > % +#endif > % + cp = NULL; > % + if (bufsize > 0) > % + cp = (char *)malloc(bufsize, M_GPROF, M_WAITOK | M_ZERO); > % +#ifdef GPROF > % p->tos = (struct tostruct *)cp; > % cp += p->tossize; > % +#else > % + p->tos = (struct tostruct *)NULL; > % +#endif > % p->kcount = (HISTCOUNTER *)cp; > % cp += p->kcountsize; > % +#ifdef GPROF > % p->froms = (u_short *)cp; > % +#else > % + p->froms = (u_short *)NULL; > % +#endif > % p->histcounter_type = FUNCTION_ALIGNMENT / HISTFRACTION * NBBY; > % > % #ifdef GUPROF > % @@ -351,6 +408,12 @@ > % } else if (state == GMON_PROF_ON) { > % gp->state = GMON_PROF_OFF; > % stopguprof(gp); > > I don't see any locking here. Are all sysctls still locked by Giant? > Then we have almost enough locking here, and the Giant locking in the > alloc/free code is null. We are missing something to propagate the > state change to other CPUs, even for the current "working" version, > except possibly in the GUPROF case, if stopguprof() does it accidentally > (but it doesn't). This doesn't matter much if all the buffers have > been allocated and won't be freed or moved, but for the patch and for > kmupetext() we are freeing or moving (kmupetext() actually doesn't > even change the state to off on the current CPU like this does). > > Giant or almost any locking is enough once we have propagated the change > here: only code near here is allowed to change the state, so Giant or > whatever locking prevents it changing. Later when the state is changed > to on, we should be careful to propagate the change of all the pointers > in *gp to other CPUs before setting the state to on. > > % +#ifndef GUPROF > % + /* Allocate kcount buffer and initialize state */ > > Comment with style bug repeated ad nauseum. > > % + error = prof_alloc_kcountbuf(); > % + if (error) > % + return error; > > Style bug: missing parentheses. > > Sorry I made the code and critical locking changes unreadable by pointing > out the style bugs. > > % +#endif > % gp->profrate = profhz; > % PROC_LOCK(&proc0); > % startprofclock(&proc0); > % @@ -368,15 +431,24 @@ > % } else if (state != gp->state) > % return (EINVAL); > % return (0); > > Many examples of KNF with non-missing parentheses in nearby code. > > % +#ifndef GPROF > % + if (gp->state != GMON_PROF_OFF) > % + return EINVAL; > > Style bug: missing parentheses. > > Now we depend on a previous sysctl having set the state to off, and need > locking to keep it off while we are changing things. > > % + /* Free kcount buffer */ > > Now the comment with style bug is not repeated, but it is still of > negative use since the function name is too verbose and says more than > the comment. > > % + prof_free_kcountbuf(); > % +#endif > % + return 0; > > Style bug: missing parentheses. > > After return, and hopefully not before, another sysctl may change the > state to on, and we just need something to propagate the pointers in > *gp to other CPUs. This is more critical for alloc. Any locking for > the whole sysctl guarantees this. So does the Giant locking in the > alloc/free functions (and similarly for propagating the state change), > but I think it is an obfuscation of using nested Giant locking for its > side affect of memory ordering (locking of the whole operation is still > needed for exclusive access). > > % case GPROF_COUNT: > % return (sysctl_handle_opaque(oidp, > % gp->kcount, gp->kcountsize, req)); > % +#ifdef GPROF > % case GPROF_FROMS: > % return (sysctl_handle_opaque(oidp, > % gp->froms, gp->fromssize, req)); > % case GPROF_TOS: > % return (sysctl_handle_opaque(oidp, > % gp->tos, gp->tossize, req)); > % +#endif > % case GPROF_GMONPARAM: > % return (sysctl_handle_opaque(oidp, gp, sizeof *gp, req)); > % default: > % @@ -386,7 +458,6 @@ > % } > % > % SYSCTL_NODE(_kern, KERN_PROF, prof, CTLFLAG_RW, sysctl_kern_prof, ""); > % -#endif /* GPROF */ > % > % /* > % * Profiling system call. > % Index: sys/sys/gmon.h > % =================================================================== > % --- sys/sys/gmon.h (revision 209745) > % +++ sys/sys/gmon.h (working copy) > % @@ -197,6 +197,7 @@ > % #define GPROF_FROMS 2 /* struct: from location hash bucket > */ > % #define GPROF_TOS 3 /* struct: destination/count > structure */ > % #define GPROF_GMONPARAM 4 /* struct: profiling parameters (see > above) */ > > Examples of normal style in nearby code: tab after #define, and tabs to > indent. > > % +#define GPROF_FREEBUF 5 /* int: free flat profiling buffer */ > > Style bugs: > - space after #define > - massive tab lossage to indent, as usual. > > % > % #ifdef _KERNEL > % > > Bruce
Thank you for spending time on this patch review. I will need some time to digest your comments on this code. Sean _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"