On Tue, Aug 06, 2019 at 12:38:51AM +0200, Mark Kettenis wrote:
> > Date: Mon, 5 Aug 2019 16:58:27 +0300
> > From: Paul Irofti <[email protected]>
> > 
> > On Fri, Aug 02, 2019 at 01:29:37PM +0300, Paul Irofti wrote:
> > > On Mon, Jul 01, 2019 at 10:32:51AM +0200, Mark Kettenis wrote:
> > > > > Date: Thu, 27 Jun 2019 15:08:00 +0300
> > > > > From: Paul Irofti <[email protected]>
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Here is an initial diff, adapted from NetBSD, that synchronizes TSC
> > > > > clocks across cores.
> > > > > 
> > > > > CPU0 is the reference clock and all others are skewed. During CPU
> > > > > initialization the clocks synchronize by keeping a registry of each 
> > > > > CPU
> > > > > clock skewness and adapting the TSC read routine accordingly.
> > > > > 
> > > > > I choose this implementation over what FreeBSD is doing (which is just
> > > > > copying Linux really), because it is clean and elegant.
> > > > > 
> > > > > I would love to hear reports from machines that were broken by this.
> > > > > Mine, which never exhibited the problem in the first place, run just
> > > > > fine with the following diff. In fact I am writting this message on 
> > > > > one
> > > > > such machine.
> > > > > 
> > > > > Also constructive comments are more than welcomed!
> > > > > 
> > > > > Notes:
> > > > > 
> > > > > - cpu_counter_serializing() could probably have a better name
> > > > >   (tsc _read for example)
> > > > > - the PAUSE instruction is probably not needed
> > > > > - acpi(4) suspend and resume bits are left out on purpose, but should
> > > > >   be trivial to add once the current diff settles
> > > > > 
> > > > > Paul Irofti
> > > > 
> > > > I don't think we want to introduce a <machine/tsc.h> header file.
> > > > 
> > > > The code suffers from some NetBSD-isms, so that'll need to be fixed.
> > > > I pointed some of them out below.
> > > > 
> > > > Also, how accurate is your skew detection?  What skew is detected on a
> > > > machine that (supposedly) has the TSCs in sync?  The result will be
> > > > that you actually slightly desync the counters on different CPUs.
> > > > 
> > > > I think Linux uses the TSC_ADJUST MSR and compares its value across
> > > > cores.  If the skew is small and the TSC_ADJUST values are the same
> > > > across cores it skips the TSC adjustments.
> > > 
> > > Hi,
> > > 
> > > Here is an updated diff with a few bugs eliminated from the previous and
> > > with most of the concerns I got in private and from Mark fixed.
> > > 
> > > I will do the TSC_ADJUST_MSR dance in another iteration if the current
> > > incarnation turns out to be correct for machines suffering from TSCs not
> > > in sync.
> > > 
> > > The thing I am mostly worried about now is in the following sum
> > > 
> > >  uint
> > >  tsc_get_timecount(struct timecounter *tc)
> > >  {
> > >   return rdtsc() + curcpu()->cpu_cc_skew;
> > >  }
> > >  
> > > can one term be executed on one CPU and the other on another? Is there a
> > > way to protect this from happening other than locking?
> > > 
> > > I see NetBSD is checking for a change in the number of context switches 
> > > of the current process.
> > > 
> > > My plan is to have a fix in the tree before 6.6 is released, so I would
> > > love to hear your thoughts and reports on this.
> > > 
> > > Thanks,
> > > Paul
> > 
> > Hi,
> > 
> > Here is a third version of the TSC diff that also take into
> > consideration the suspend-resume path which was ignored by the previous
> > thus rendering resume broken.
> 
> Hmm, wat is this diff supposed to do upon suspend-resume?  I'm fairly
> certain that you'll need to recalibrate the skew in the resume path.
> But it doesn't seem to do that.  Or at least it doesn't print any
> messages.

I agree with kettenis. You definitely will need to re-calculate and apply
the skews on resume.

-ml

> 
> Further comments/questions/requests below...
> 
> 
> Anyway, here is what gets reported on my x1c3:
> 
> cpu0: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2494.66 MHz, 06-3d-04
> 
> tsc_timecounter_init: TSC skew=0 observed drift=0
> tsc_timecounter_init: TSC skew=-344 observed drift=0
> tsc_timecounter_init: TSC skew=-8 observed drift=0
> tsc_timecounter_init: TSC skew=-26 observed drift=0
> 
> > Index: arch/amd64/amd64/acpi_machdep.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/acpi_machdep.c,v
> > retrieving revision 1.86
> > diff -u -p -u -p -r1.86 acpi_machdep.c
> > --- arch/amd64/amd64/acpi_machdep.c 23 Oct 2018 17:51:32 -0000      1.86
> > +++ arch/amd64/amd64/acpi_machdep.c 5 Aug 2019 13:54:33 -0000
> > @@ -60,6 +60,8 @@ extern paddr_t tramp_pdirpa;
> >  
> >  extern int acpi_savecpu(void) __returns_twice;
> >  
> > +extern int64_t tsc_drift_observed;
> > +
> >  #define ACPI_BIOS_RSDP_WINDOW_BASE        0xe0000
> >  #define ACPI_BIOS_RSDP_WINDOW_SIZE        0x20000
> >  
> > @@ -481,6 +483,8 @@ acpi_resume_cpu(struct acpi_softc *sc)
> >  {
> >     fpuinit(&cpu_info_primary);
> >  
> > +   cpu_info_primary.cpu_cc_skew = 0;       /* futile */
> 
> So you'll drop it from the final version?
> 
> > +   tsc_drift_observed = 0;                 /* reset tsc drift on resume */
> 
> What is the point of this?  Do you think that after a suspend/resume
> cycle the TSCs are suddenly not drifting anymore?
> 
> >     cpu_init(&cpu_info_primary);
> >     cpu_ucode_apply(&cpu_info_primary);
> >  
> > Index: arch/amd64/amd64/cpu.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
> > retrieving revision 1.137
> > diff -u -p -u -p -r1.137 cpu.c
> > --- arch/amd64/amd64/cpu.c  28 May 2019 18:17:01 -0000      1.137
> > +++ arch/amd64/amd64/cpu.c  5 Aug 2019 13:54:34 -0000
> > @@ -754,6 +754,10 @@ cpu_init(struct cpu_info *ci)
> >     cr4 = rcr4();
> >     lcr4(cr4 & ~CR4_PGE);
> >     lcr4(cr4);
> > +
> > +   /* Synchronize TSC */
> > +   if (cold && !CPU_IS_PRIMARY(ci))
> > +         tsc_sync_ap(ci);
> >  #endif
> >  }
> >  
> > @@ -808,6 +812,7 @@ void
> >  cpu_start_secondary(struct cpu_info *ci)
> >  {
> >     int i;
> > +   u_long s;
> >  
> >     ci->ci_flags |= CPUF_AP;
> >  
> > @@ -828,6 +833,17 @@ cpu_start_secondary(struct cpu_info *ci)
> >             printf("dropping into debugger; continue from here to resume 
> > boot\n");
> >             db_enter();
> >  #endif
> > +   } else {
> > +           /*
> > +            * Synchronize time stamp counters. Invalidate cache and do
> > +            * twice (in tsc_sync_bp) to minimize possible cache effects.
> 
> Do what twice?
> 
> > +            * Disable interrupts to try and rule out any external
> > +            * interference.
> > +            */
> > +           s = intr_disable();
> > +           wbinvd();
> > +           tsc_sync_bp(ci);
> > +           intr_restore(s);
> >     }
> >  
> >     if ((ci->ci_flags & CPUF_IDENTIFIED) == 0) {
> > @@ -852,6 +868,8 @@ void
> >  cpu_boot_secondary(struct cpu_info *ci)
> >  {
> >     int i;
> > +   int64_t drift;
> > +   u_long s;
> >  
> >     atomic_setbits_int(&ci->ci_flags, CPUF_GO);
> >  
> > @@ -864,6 +882,17 @@ cpu_boot_secondary(struct cpu_info *ci)
> >             printf("dropping into debugger; continue from here to resume 
> > boot\n");
> >             db_enter();
> >  #endif
> > +   } else if (cold) {
> > +           /* Synchronize TSC again, check for drift. */
> > +           drift = ci->cpu_cc_skew;
> > +           s = intr_disable();
> > +           wbinvd();
> > +           tsc_sync_bp(ci);
> > +           intr_restore(s);
> > +           drift -= ci->cpu_cc_skew;
> > +           printf("TSC skew=%lld drift=%lld\n",
> > +               (long long)ci->cpu_cc_skew, (long long)drift);
> > +           tsc_sync_drift(drift);
> >     }
> >  }
> >  
> > @@ -888,7 +917,14 @@ cpu_hatch(void *v)
> >             panic("%s: already running!?", ci->ci_dev->dv_xname);
> >  #endif
> >  
> > +   /*
> > +    * Synchronize the TSC for the first time. Note that interrupts are
> > +    * off at this point.
> > +    */
> > +   wbinvd();
> >     ci->ci_flags |= CPUF_PRESENT;
> > +   ci->cpu_cc_skew = 0;    /* reset on resume */
> > +   tsc_sync_ap(ci);
> >  
> >     lapic_enable();
> >     lapic_startclock();
> > Index: arch/amd64/amd64/tsc.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
> > retrieving revision 1.11
> > diff -u -p -u -p -r1.11 tsc.c
> > --- arch/amd64/amd64/tsc.c  6 Jun 2019 19:43:35 -0000       1.11
> > +++ arch/amd64/amd64/tsc.c  5 Aug 2019 13:54:34 -0000
> > @@ -1,8 +1,10 @@
> >  /* $OpenBSD: tsc.c,v 1.11 2019/06/06 19:43:35 kettenis Exp $       */
> >  /*
> > + * Copyright (c) 2008 The NetBSD Foundation, Inc.
> >   * Copyright (c) 2016,2017 Reyk Floeter <[email protected]>
> >   * Copyright (c) 2017 Adam Steen <[email protected]>
> >   * Copyright (c) 2017 Mike Belopuhov <[email protected]>
> > + * Copyright (c) 2019 Paul Irofti <[email protected]>
> >   *
> >   * Permission to use, copy, modify, and distribute this software for any
> >   * purpose with or without fee is hereby granted, provided that the above
> > @@ -20,6 +22,7 @@
> >  #include <sys/param.h>
> >  #include <sys/systm.h>
> >  #include <sys/timetc.h>
> > +#include <sys/atomic.h>
> >  
> >  #include <machine/cpu.h>
> >  #include <machine/cpufunc.h>
> > @@ -33,6 +36,13 @@ int              tsc_recalibrate;
> >  uint64_t   tsc_frequency;
> >  int                tsc_is_invariant;
> >  
> > +int64_t    tsc_drift_max = 250;    /* max cycles */
> > +int64_t    tsc_drift_observed;
> > +bool       tsc_good;
> 
> We don't use bool in the kernel.  And this variable is probably better
> named...
> 
> > +
> > +volatile int64_t   tsc_sync_val;
> > +volatile struct cpu_info   *tsc_sync_cpu;
> > +
> >  uint               tsc_get_timecount(struct timecounter *tc);
> >  
> >  struct timecounter tsc_timecounter = {
> > @@ -172,10 +182,8 @@ calibrate_tsc_freq(void)
> >             return;
> >     tsc_frequency = freq;
> >     tsc_timecounter.tc_frequency = freq;
> > -#ifndef MULTIPROCESSOR
> >     if (tsc_is_invariant)
> >             tsc_timecounter.tc_quality = 2000;
> > -#endif
> >  }
> >  
> >  void
> > @@ -194,26 +202,25 @@ cpu_recalibrate_tsc(struct timecounter *
> >  uint
> >  tsc_get_timecount(struct timecounter *tc)
> >  {
> > -   return rdtsc();
> > +   return rdtsc() + curcpu()->cpu_cc_skew;
> >  }
> >  
> >  void
> >  tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
> >  {
> > -   if (!(ci->ci_flags & CPUF_PRIMARY) ||
> > -       !(ci->ci_flags & CPUF_CONST_TSC) ||
> > +   if (!(ci->ci_flags & CPUF_CONST_TSC) ||
> >         !(ci->ci_flags & CPUF_INVAR_TSC))
> >             return;
> >  
> >     tsc_frequency = tsc_freq_cpuid(ci);
> >     tsc_is_invariant = 1;
> >  
> > +   tsc_good = (rdmsr(MSR_TSC) != 0 || rdmsr(MSR_TSC) != 0);
> > +
> 
> ... something like tsc_is_running, since that is what you seem to be
> testing here.
> 
> >     /* Newer CPUs don't require recalibration */
> >     if (tsc_frequency > 0) {
> >             tsc_timecounter.tc_frequency = tsc_frequency;
> > -#ifndef MULTIPROCESSOR
> >             tsc_timecounter.tc_quality = 2000;
> > -#endif
> >     } else {
> >             tsc_recalibrate = 1;
> >             tsc_frequency = cpufreq;
> > @@ -221,5 +228,112 @@ tsc_timecounter_init(struct cpu_info *ci
> >             calibrate_tsc_freq();
> >     }
> >  
> > -   tc_init(&tsc_timecounter);
> > +   if (tsc_drift_observed > tsc_drift_max) {
> > +           printf("ERROR: %lld cycle TSC drift observed\n",
> > +               (long long)tsc_drift_observed);
> > +           tsc_timecounter.tc_quality = -100;
> > +           tsc_is_invariant = 0;
> > +   }
> > +
> > +   printf("%s: TSC skew=%lld observed drift=%lld\n", __func__,
> > +       (long long)ci->cpu_cc_skew, (long long)tsc_drift_observed);
> > +
> > +   if (ci->ci_flags & CPUF_PRIMARY)
> > +           tc_init(&tsc_timecounter);
> > +}
> > +
> > +static uint64_t
> > +cpu_counter_serializing(struct cpu_info *ci)
> > +{
> > +   if (tsc_good)
> > +           return rdmsr(MSR_TSC);
> > +   else
> > +           return (rdtsc() + ci->cpu_cc_skew);
> 
> Why are you adding the skew here?
> 
> > +}
> > +
> > +/*
> > + * Record drift (in clock cycles).  Called during AP startup.
> > + */
> > +void
> > +tsc_sync_drift(int64_t drift)
> > +{
> > +   if (drift < 0)
> > +           drift = -drift;
> > +   if (drift > tsc_drift_observed)
> > +           tsc_drift_observed = drift;
> > +}
> > +
> > +/*
> > + * Called during startup of APs, by the boot processor.  Interrupts
> > + * are disabled on entry.
> > + */
> > +static void
> > +tsc_read_bp(struct cpu_info *ci, uint64_t *bptscp, uint64_t *aptscp)
> > +{
> > +   uint64_t bptsc;
> > +
> > +   if (atomic_swap_ptr(&tsc_sync_cpu, ci) != NULL)
> > +           panic("tsc_sync_bp: 1");
> > +
> > +   /* Flag it and read our TSC. */
> > +   atomic_setbits_int(&ci->ci_flags, CPUF_SYNCTSC);
> > +   bptsc = cpu_counter_serializing(ci) >> 1;
> > +
> > +   /* Wait for remote to complete, and read ours again. */
> > +   while ((ci->ci_flags & CPUF_SYNCTSC) != 0)
> > +           membar_consumer();
> > +   bptsc += (cpu_counter_serializing(ci) >> 1);
> > +
> > +   /* Wait for the results to come in. */
> > +   while (tsc_sync_cpu == ci)
> > +           CPU_BUSY_CYCLE();
> > +   if (tsc_sync_cpu != NULL)
> > +           panic("tsc_sync_bp: 2");
> > +
> > +   *bptscp = bptsc;
> > +   *aptscp = tsc_sync_val;
> > +}
> > +
> > +void
> > +tsc_sync_bp(struct cpu_info *ci)
> > +{
> > +   uint64_t bptsc, aptsc;
> > +
> > +   tsc_read_bp(ci, &bptsc, &aptsc); /* discarded - cache effects */
> > +   tsc_read_bp(ci, &bptsc, &aptsc);
> > +
> > +   /* Compute final value to adjust for skew. */
> > +   ci->cpu_cc_skew = bptsc - aptsc;
> > +}
> > +
> > +/*
> > + * Called during startup of AP, by the AP itself.  Interrupts are
> > + * disabled on entry.
> > + */
> > +static void
> > +tsc_post_ap(struct cpu_info *ci)
> > +{
> > +   uint64_t tsc;
> > +
> > +   /* Wait for go-ahead from primary. */
> > +   while ((ci->ci_flags & CPUF_SYNCTSC) == 0)
> > +           membar_consumer();
> > +   tsc = (cpu_counter_serializing(ci) >> 1);
> > +
> > +   /* Instruct primary to read its counter. */
> > +   atomic_clearbits_int(&ci->ci_flags, CPUF_SYNCTSC);
> > +   tsc += (cpu_counter_serializing(ci) >> 1);
> > +
> > +   /* Post result.  Ensure the whole value goes out atomically. */
> > +   (void)atomic_swap_64(&tsc_sync_val, tsc);
> > +
> > +   if (atomic_swap_ptr(&tsc_sync_cpu, NULL) != ci)
> > +           panic("tsc_sync_ap");
> > +}
> > +
> > +void
> > +tsc_sync_ap(struct cpu_info *ci)
> > +{
> > +   tsc_post_ap(ci);
> > +   tsc_post_ap(ci);
> >  }
> > Index: arch/amd64/include/cpu.h
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
> > retrieving revision 1.131
> > diff -u -p -u -p -r1.131 cpu.h
> > --- arch/amd64/include/cpu.h        17 May 2019 19:07:16 -0000      1.131
> > +++ arch/amd64/include/cpu.h        5 Aug 2019 13:54:34 -0000
> > @@ -206,6 +206,8 @@ struct cpu_info {
> >     union           vmm_cpu_cap ci_vmm_cap;
> >     paddr_t         ci_vmxon_region_pa;
> >     struct vmxon_region *ci_vmxon_region;
> > +
> > +   int64_t         cpu_cc_skew;            /* counter skew vs cpu0 */
> 
> Rename this to ci_tsc_skew?
> 
> >  };
> >  
> >  #define CPUF_BSP   0x0001          /* CPU is the original BSP */
> > @@ -221,6 +223,7 @@ struct cpu_info {
> >  #define CPUF_INVAR_TSC     0x0100          /* CPU has invariant TSC */
> >  #define CPUF_USERXSTATE    0x0200          /* CPU has curproc's xsave 
> > state */
> >  
> > +#define    CPUF_SYNCTSC    0x0800          /* Synchronize TSC */
> 
> tab vs. spaces
> 
> >  #define CPUF_PRESENT       0x1000          /* CPU is present */
> >  #define CPUF_RUNNING       0x2000          /* CPU is running */
> >  #define CPUF_PAUSE 0x4000          /* CPU is paused in DDB */
> > Index: arch/amd64/include/cpuvar.h
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/include/cpuvar.h,v
> > retrieving revision 1.9
> > diff -u -p -u -p -r1.9 cpuvar.h
> > --- arch/amd64/include/cpuvar.h     6 Oct 2017 13:33:53 -0000       1.9
> > +++ arch/amd64/include/cpuvar.h     5 Aug 2019 13:54:34 -0000
> > @@ -98,4 +98,8 @@ void cpu_init(struct cpu_info *);
> >  void cpu_init_first(void);
> >  void cpu_adjust_tsc_freq(uint64_t (*)());
> >  
> > +void tsc_sync_drift(int64_t);
> > +void tsc_sync_bp(struct cpu_info *);
> > +void tsc_sync_ap(struct cpu_info *);
> > +
> >  #endif
> > 
> 

Reply via email to