The 'magic' here was that MD code could choose to implement statclock (and
set stathz appropriately), or MD code could not care about the multiple
statclock/hardclock interfaces into the scheduler. Also some clock drivers
on a platform could enable split hardclock/statclock where others did not.
Back near the beginning of OpenBSD platforms existed that had no higher
precision clock than that timer interrupt, and nothing existed to even
approximate higher precision (eg cycle counters or instruction counters).
Some clock drivers have a separate timer/interrupt or separate 'event'
tracked to schedule the stat() event. These platforms may also
(dynamically) change the stathz clock when profiling is enabled/disabled.
This is implemented in arm64 in agtimer_setstatclockrate()
Any clock driver that directly calls statclock() should make certain to
stathz (and profhz) appropriately, as no assumptions to it's rate/frequency
should be assumed.
This isn't to say that removing the stathz== 0 magic should not be done,
but if done, make certain that stathz and profhz are properly
updated/configured.
Dale
On Sat, Jan 9, 2021 at 5:57 AM Visa Hankala <v...@hankala.org> wrote:
> On Fri, Jan 08, 2021 at 10:18:27AM -0600, Scott Cheloha wrote:
> > On Thu, Jan 07, 2021 at 08:12:10PM -0600, Scott Cheloha wrote:
> > > On Thu, Jan 07, 2021 at 09:37:58PM +0100, Mark Kettenis wrote:
> > > > > Date: Thu, 7 Jan 2021 11:15:41 -0600
> > > > > From: Scott Cheloha <scottchel...@gmail.com>
> > > > >
> > > > > Hi,
> > > > >
> > > > > I want to isolate statclock() from hardclock(9). This will
> simplify
> > > > > the logic in my WIP dynamic clock interrupt framework.
> > > > >
> > > > > Currently, if stathz is zero, we call statclock() from within
> > > > > hardclock(9). It looks like this (see sys/kern/kern_clock.c):
> > > > >
> > > > > void
> > > > > hardclock(struct clockframe *frame)
> > > > > {
> > > > > /* [...] */
> > > > >
> > > > > if (stathz == 0)
> > > > > statclock(frame);
> > > > >
> > > > > /* [...] */
> > > > >
> > > > > This is the case on alpha, amd64 (w/ lapic), hppa, i386 (w/ lapic),
> > > > > loongson, luna88k, mips64, and sh.
> > > > >
> > > > > (We seem to do it on sgi, too. I was under the impression that sgi
> > > > > *was* a mips64 platform, yet sgi seems to it have its own clock
> > > > > interrupt code distinct from mips64's general clock interrupt code
> > > > > which is used by e.g. octeon).
> > > > >
> > > > > However, if stathz is not zero we call statclock() separately.
> This
> > > > > is the case on armv7, arm, arm64, macppc, powerpc64, and sparc64.
> > > > >
> > > > > (The situation for the general powerpc code and socppc in
> particular
> > > > > is a mystery to me.)
> > > > >
> > > > > If we could remove this MD distinction it would make my MI
> framework
> > > > > simpler. Instead of checking stathz and conditionally starting a
> > > > > statclock event I will be able to unconditionally start a statclock
> > > > > event on all platforms on every CPU.
> > > > >
> > > > > In general I don't think the "is stathz zero?" variance between
> > > > > platforms is useful:
> > > > >
> > > > > - The difference is invisible to userspace, as we hide the fact
> that
> > > > > stathz is zero from e.g. the kern.clockrate sysctl.
> > > > >
> > > > > - We run statclock() at some point, regardless of whether stathz is
> > > > > zero. If statclock() is run from hardclock(9) then isn't stathz
> > > > > effectively equal to hz?
> > > > >
> > > > > - Because stathz might be zero we need to add a bunch of safety
> checks
> > > > > to our MI code to ensure we don't accidentally divide by zero.
> > > > >
> > > > > Maybe we can ensure stathz is non-zero in a later diff...
> > > > >
> > > > > --
> > > > >
> > > > > Anyway, I don't think I have missed any platforms. However, if
> > > > > platform experts could weigh in here to verify my changes (and test
> > > > > them!) I'd really appreciate it.
> > > > >
> > > > > In particular, I'm confused about how clock interrupts work on
> > > > > powerpc, socppc, and sgi.
> > > > >
> > > > > --
> > > > >
> > > > > Thoughts? Platform-specific OKs?
> > > >
> > > > I wouldn't be opposed to doing this. It is less magic!
> > > >
> > > > But yes, this needs to be tested on the platforms that you change.
> > >
> > > I guess I'll CC all the platform-specific people I'm aware of.
> > >
> > > > Note that many platforms don't have have separate schedclock and
> > > > statclock. But on many platforms where we use a one-shot timer as
> the
> > > > clock we have a randomized statclock. I'm sure Theo would love to
> > > > tell you about the cpuhog story...
> > >
> > > I am familiar with cpuhog. It's the one thing everybody mentions when
> > > I talk about clock interrupts and/or statclock().
> > >
> > > Related:
> > >
> > > I wonder if we could avoid the cpuhog problem entirely by implementing
> > > some kind of MI cycle counting clock API that we use to timestamp
> > > whenever we cross the syscall boundary, or enter an interrupt, etc.,
> > > to determine the time a thread spends using the CPU without any
> > > sampling error.
> > >
> > > Instead of a process accumulating ticks from a sampling clock
> > > interrupt you would accumulate, say, a 64-bit count of cycles, or
> > > something like that.
> > >
> > > Sampling with a regular clock interrupt is prone to error and trickery
> > > like cpuhog. The classic BSD solution to the cpuhog exploit was to
> > > randomize the statclock/schedclock to make it harder to fool the
> > > sampler. But if we used cycle counts or instruction counts at each
> > > state transition it would be impossible to fool because we wouldn't be
> > > sampling at all.
>
> I think statclock is run at different frequency than hardclock for a good
> reason. If their frequency is the same and there is no variable offset,
> the sampling is prone to bias. For example, let's assume that each
> hardclock tick makes the system spend the first half of tick interval
> in kernel, say in soft interrupt code. Does that kernel time get noticed?
>
> > > Unsure what the performance implications would be, but in general I
> > > would guess that most platforms have a way to count instructions or
> > > cycles and that reading this data is fast enough for us to use it in
> > > e.g. syscall() or the interrupt handler without a huge performance
> > > hit.
> > >
> > > > Anyway, we probably want that on amd64 as well.
> > >
> > > My WIP dynamic clock interrupt system can run a randomized statclock()
> > > on amd64 boxes with a lapic. I imagine we will be able to do the same
> > > on i386 systems that have a lapic, too, though it will be slower
> > > because all the i386 timecounters are glacial compared to the TSC.
> > >
> > > Eventually I want to isolate schedclock() from statclock() and run it
> > > as an independent event. But that's a "later on" goal. For now I'm
> > > just trying to get every platform as similar as possible to make
> > > merging the dynamic clock interrupt work less painful.
> >
> > Whoops, some garbage snuck into amd64/lapic.c.
> >
> > Here's the patch without it.
> >
> > Also, sorry if I've CC'd you and you're not the right person for one
> > of these platforms/architectures. My thinking is:
> >
> > miod: loongson (?), sh
> > aoyama: luna88k
> > visa: mips64, sgi
> > deraadt: alpha
> > kettenis: hppa
> > sthen: i386
> >
> > Index: sys/kern/kern_clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_clock.c,v
> > retrieving revision 1.101
> > diff -u -p -r1.101 kern_clock.c
> > --- sys/kern/kern_clock.c 21 Jan 2020 16:16:23 -0000 1.101
> > +++ sys/kern/kern_clock.c 8 Jan 2021 15:56:24 -0000
> > @@ -164,12 +164,6 @@ hardclock(struct clockframe *frame)
> > }
> > }
> >
> > - /*
> > - * If no separate statistics clock is available, run it from here.
> > - */
> > - if (stathz == 0)
> > - statclock(frame);
> > -
> > if (--ci->ci_schedstate.spc_rrticks <= 0)
> > roundrobin(ci);
> >
> > Index: sys/arch/alpha/alpha/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/alpha/alpha/clock.c,v
> > retrieving revision 1.24
> > diff -u -p -r1.24 clock.c
> > --- sys/arch/alpha/alpha/clock.c 6 Jul 2020 13:33:06 -0000
> 1.24
> > +++ sys/arch/alpha/alpha/clock.c 8 Jan 2021 15:56:24 -0000
> > @@ -136,6 +136,13 @@ clockattach(dev, fns)
> > * Machine-dependent clock routines.
> > */
> >
> > +void
> > +clockintr(struct clockframe *frame)
> > +{
> > + hardclock(frame);
> > + statclock(frame);
> > +}
> > +
> > /*
> > * Start the real-time and statistics clocks. Leave stathz 0 since there
> > * are no other timers available.
> > @@ -165,7 +172,7 @@ cpu_initclocks(void)
> > * hardclock, which would then fall over because the pointer
> > * to the virtual timers wasn't set at that time.
> > */
> > - platform.clockintr = hardclock;
> > + platform.clockintr = clockintr;
> > schedhz = 16;
> >
> > evcount_attach(&clk_count, "clock", &clk_irq);
> > Index: sys/arch/amd64/amd64/lapic.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> > retrieving revision 1.57
> > diff -u -p -r1.57 lapic.c
> > --- sys/arch/amd64/amd64/lapic.c 6 Sep 2020 20:50:00 -0000
> 1.57
> > +++ sys/arch/amd64/amd64/lapic.c 8 Jan 2021 15:56:25 -0000
> > @@ -452,6 +452,7 @@ lapic_clockintr(void *arg, struct intrfr
> > floor = ci->ci_handled_intr_level;
> > ci->ci_handled_intr_level = ci->ci_ilevel;
> > hardclock((struct clockframe *)&frame);
> > + statclock((struct clockframe *)&frame);
> > ci->ci_handled_intr_level = floor;
> >
> > clk_count.ec_count++;
> > Index: sys/arch/hppa/dev/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/hppa/dev/clock.c,v
> > retrieving revision 1.31
> > diff -u -p -r1.31 clock.c
> > --- sys/arch/hppa/dev/clock.c 6 Jul 2020 13:33:07 -0000 1.31
> > +++ sys/arch/hppa/dev/clock.c 8 Jan 2021 15:56:25 -0000
> > @@ -43,7 +43,7 @@
> >
> > u_long cpu_hzticks;
> >
> > -int cpu_hardclock(void *);
> > +int cpu_clockintr(void *);
> > u_int itmr_get_timecount(struct timecounter *);
> >
> > struct timecounter itmr_timecounter = {
> > @@ -106,7 +106,7 @@ cpu_initclocks(void)
> > }
> >
> > int
> > -cpu_hardclock(void *v)
> > +cpu_clockintr(void *v)
> > {
> > struct cpu_info *ci = curcpu();
> > u_long __itmr, delta, eta;
> > @@ -114,14 +114,15 @@ cpu_hardclock(void *v)
> > register_t eiem;
> >
> > /*
> > - * Invoke hardclock as many times as there has been cpu_hzticks
> > - * ticks since the last interrupt.
> > + * Invoke hardclock and statclock as many times as there has been
> > + * cpu_hzticks ticks since the last interrupt.
> > */
> > for (;;) {
> > mfctl(CR_ITMR, __itmr);
> > delta = __itmr - ci->ci_itmr;
> > if (delta >= cpu_hzticks) {
> > hardclock(v);
> > + statclock(v);
> > ci->ci_itmr += cpu_hzticks;
> > } else
> > break;
> > Index: sys/arch/hppa/dev/cpu.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/hppa/dev/cpu.c,v
> > retrieving revision 1.42
> > diff -u -p -r1.42 cpu.c
> > --- sys/arch/hppa/dev/cpu.c 29 May 2020 04:42:23 -0000 1.42
> > +++ sys/arch/hppa/dev/cpu.c 8 Jan 2021 15:56:25 -0000
> > @@ -89,7 +89,7 @@ cpuattach(struct device *parent, struct
> > extern u_int cpu_ticksnum, cpu_ticksdenom;
> > extern u_int fpu_enable;
> > /* clock.c */
> > - extern int cpu_hardclock(void *);
> > + extern int cpu_clockintr(void *);
> > /* ipi.c */
> > extern int hppa_ipi_intr(void *);
> >
> > @@ -173,7 +173,7 @@ cpuattach(struct device *parent, struct
> > printf(", %u/%u D/I BTLBs",
> > pdc_btlb.finfo.num_i, pdc_btlb.finfo.num_d);
> >
> > - cpu_intr_establish(IPL_CLOCK, 31, cpu_hardclock, NULL, "clock");
> > + cpu_intr_establish(IPL_CLOCK, 31, cpu_clockintr, NULL, "clock");
> > #ifdef MULTIPROCESSOR
> > cpu_intr_establish(IPL_IPI, 30, hppa_ipi_intr, NULL, "ipi");
> > #endif
> > Index: sys/arch/i386/i386/lapic.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
> > retrieving revision 1.47
> > diff -u -p -r1.47 lapic.c
> > --- sys/arch/i386/i386/lapic.c 30 Jul 2018 14:19:12 -0000
> 1.47
> > +++ sys/arch/i386/i386/lapic.c 8 Jan 2021 15:56:25 -0000
> > @@ -257,6 +257,7 @@ lapic_clockintr(void *arg)
> > struct clockframe *frame = arg;
> >
> > hardclock(frame);
> > + statclock(frame);
> >
> > clk_count.ec_count++;
> > }
> > Index: sys/arch/loongson/dev/glxclk.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/loongson/dev/glxclk.c,v
> > retrieving revision 1.5
> > diff -u -p -r1.5 glxclk.c
> > --- sys/arch/loongson/dev/glxclk.c 19 Jul 2015 21:11:47 -0000 1.5
> > +++ sys/arch/loongson/dev/glxclk.c 8 Jan 2021 15:56:25 -0000
> > @@ -286,6 +286,7 @@ glxclk_intr(void *arg)
> > return 1;
> >
> > hardclock(frame);
> > + statclock(frame);
> >
> > return 1;
> > }
>
> This does not seem correct because statclock() is already called from
> glxclk_stat_intr().
>
> > Index: sys/arch/luna88k/luna88k/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/luna88k/luna88k/clock.c,v
> > retrieving revision 1.15
> > diff -u -p -r1.15 clock.c
> > --- sys/arch/luna88k/luna88k/clock.c 6 Jul 2020 13:33:07 -0000
> 1.15
> > +++ sys/arch/luna88k/luna88k/clock.c 8 Jan 2021 15:56:25 -0000
> > @@ -165,8 +165,10 @@ clockintr(void *eframe)
> > clockevc->ec_count++;
> >
> > *(volatile uint32_t *)(ci->ci_clock_ack) = ~0;
> > - if (clockinitted)
> > + if (clockinitted) {
> > hardclock(eframe);
> > + statclock(eframe);
> > + }
> > return 1;
> > }
> >
> > Index: sys/arch/mips64/mips64/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
> > retrieving revision 1.42
> > diff -u -p -r1.42 clock.c
> > --- sys/arch/mips64/mips64/clock.c 30 Jun 2020 14:56:10 -0000
> 1.42
> > +++ sys/arch/mips64/mips64/clock.c 8 Jan 2021 15:56:25 -0000
> > @@ -151,6 +151,7 @@ cp0_int5(uint32_t mask, struct trapframe
> > while (ci->ci_pendingticks) {
> > cp0_clock_count.ec_count++;
> > hardclock(tf);
> > + statclock(tf);
> > ci->ci_pendingticks--;
> > }
> > #ifdef MULTIPROCESSOR
> > Index: sys/arch/sgi/localbus/int.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/sgi/localbus/int.c,v
> > retrieving revision 1.15
> > diff -u -p -r1.15 int.c
> > --- sys/arch/sgi/localbus/int.c 24 Feb 2018 11:42:31 -0000
> 1.15
> > +++ sys/arch/sgi/localbus/int.c 8 Jan 2021 15:56:25 -0000
> > @@ -524,6 +524,7 @@ int_8254_intr0(uint32_t hwpend, struct t
> > while (ci->ci_pendingticks) {
> > int_clock_count.ec_count++;
> > hardclock(tf);
> > + statclock(tf);
> > ci->ci_pendingticks--;
> > }
> > }
> > Index: sys/arch/sh/sh/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v
> > retrieving revision 1.11
> > diff -u -p -r1.11 clock.c
> > --- sys/arch/sh/sh/clock.c 20 Oct 2020 15:59:17 -0000 1.11
> > +++ sys/arch/sh/sh/clock.c 8 Jan 2021 15:56:25 -0000
> > @@ -333,6 +333,7 @@ sh3_clock_intr(void *arg) /* trap frame
> > _reg_bclr_2(SH3_TCR0, TCR_UNF);
> >
> > hardclock(arg);
> > + statclock(arg);
> >
> > return (1);
> > }
> > @@ -354,6 +355,7 @@ sh4_clock_intr(void *arg) /* trap frame
> > _reg_bclr_2(SH4_TCR0, TCR_UNF);
> >
> > hardclock(arg);
> > + statclock(arg);
> >
> > return (1);
> > }
> >
>
>