kern

Konstantin Belousov Wed, 29 Jun 2016 14:21:18 -0700

On Thu, Jun 30, 2016 at 05:47:39AM +1000, Bruce Evans wrote:
> On Wed, 29 Jun 2016, Konstantin Belousov wrote:
> 
> > On Wed, Jun 29, 2016 at 05:54:43PM +0300, Konstantin Belousov wrote:
> >> This is a reply to two mails in one, both for r302251 and r302252 threads.
>


> After removing the foot-shooting tz_minuteswest variable from utc_offset(),
> all of the variables in it are ones set by adjkerntz sysctls.  The settings
> are normally ordered by not running more than 1 instance of adjkerntz.
> When adjkerntz starts, this instance shouldn't have any contention with
> other initialization doing resettodr().  There is contention every 6
> months if wall_cmos_clock when adjkerntz changes the offset for daylight
> savings time.  The chances of this used to be tiny since resettodr() was
> almost never called, but now there is the periodic resettodr() in
> kern_ntptime.c.
I do not see how your text above changes anything I said about consistency.
Each sysctl sets or fetches integer variable, so individually it is atomic.
Inter-dependencies between wall_cmos_clock and adjkerntz settings are
completely up to user, no locking can change it while keeping separate
sysctls to set and fetch each var individually.

The only side-effects in resettodr() are possible in CLOCK_SETTIME(),
which is protected against parallel invocation by resettodr_lock.

> 
> 
> >>> ...
> >>> The change in the other part of the diff is related to this, and doesn't
> >>> seem quite right.  utc_offset() uses the variables tz_minuteswest,
> >>> wall_cmos_clock and adjkerntz:
> >>> - tz_minuteswest is set with no locking in kern_settimeofday(), just
> >>>    after the settime() call.  This last had a chance of being correct
> >>>    when the kernel was UP and not preemptible.  But it is broken in
> >>>    a more fundamental way back to at least FreeBSD-1: resettodr() is
> >>>    called _before_ updating the timezone, so it uses an out-of-date
> >>>    tz_minuteswest.
> >> First, the man page is not correct, saying that 'this information is
> >> kept outside the kernel'.
> 
> We deprecate the timezone support in settimeofday(), but use wrong wording
> for this here.  The info set by this sysctl is of course kept in the kernel,
> and I think the man page is trying to say that this info is never used
> outside of the kernel.
How the later statement could be true, since that information is
returned by gettimeofday(), and we do not control what programs a user
might run on his machine.

> 
> The man page doesn't say enough about deprecation, or that we only use this
> info for foot-shooting in the kernel, only (?) in utc_offset().  utc_offset()
> does:
> 
>       return (tz_minuteswest * 60 + (wall_cmos_clock ? adjkerntz : 0));
> 
> If !wall_cmos_clock, then the offset should be 0, so any setting of
> tz_minutewest using the timezone info breaks the offset and this breaks
> the rtc _not being on wall time.
> 
> If wall_cmos_clock, then adjkerntz(8) normally maintains the offset in
> the adjkerntz variable, so again tz_minuteswest must be 0 to work and
> any setting of it to nonzero using the timezone info breaks the offset.
> 
> >> Second, why does settime(), logicall, need the updated timezone ?
> >> I understand that resettodr() is called on successful update of
> >> the wall clock.  But shouldn't the need to reset not-GMT RTC clock
> c
> 
> The timezone updates just give foot-shooting, but we need an up to
> date adjkerntz variable initially and on every daylight saving change
> in the wall_cmos_clock case.  This is the normal way to keep the rtc
> on wall time after every daylight saving changes -- adjkerntz() runs,
> and it should change the adjkerntz variable first and then call
> settimeofday() with a "null" change to get the side effect of calling
> resettodr() which does a large change of the rtc.  It is impossible
> to make a really null change of the software clock using settimeofday(),
> but I don't know of any better API to do this.  I think the locking that
> you just added is enough -- adjkerntz() just has to arrange a call to
> resettodr() strictly after it updates the variable, and that happens
> almost automatically.
I think that the way it is done is just by doing sysctl machdep.adjkerntz,
which calls resettodr().  Old wisdom to re-sync RTC to kernel-maintained
time was to run the sysctl from cron, but Andrey added the callout
several years ago.

> >> I do not think that inittodr() actually needs that locking.  This code
> >> is executed at the initialization stage, or at the resume stage.
> 
> Perhaps, but resume runs more often than every 6 months for the daylight
> savings time race, and it isn't clear if anything stops resettodr() running
> concurrently then.
> 
> In my version, something like inittodr() runs 3 seconds after every exit
> from ddb.  Stopping in ddb is a bit like a short suspend-resume -- it
> stops or breaks some timers, and inittodr() is a way to fix up the
> software real time using an unstopped timer.
Lets postpone this.  Might be, a resettodr_lock can be applied to
inittodr() as a whole, and additionally merged with some other lock.
BUt I think inittodr() is not very important right now.

> >>> *ADJ* and *adj* are not good names, since much more than ntp_adj*() needs
> >>> to be locked.
> >> Do you have a suggestion for a better name ?
> 
> Just a general name like ntp_lock for ntp or t_lock if the same lock is
> used for timecounters.  I also prefer not to use macros except for
> development versions, and then name the lock as xxx_mtx, not xxx_lock.
> We depend on special properties of (spin) mutexes.
I want to keep this macroized.  Yes, it is at development stage right now.

> >> So I postponed looking at tc_windup(), and it seems that it was
> >> useful. After you note above, it is mostly clear that we can skip some
> >> tc_windup() calls which conflicts with currently active windup calls.
> >> E.g., the hardclock update, which is done simultaneously with the
> >> setclock(), could be just skipped.  But we must not skip the tc_windup()
> >> call from setclock().
> 
> Several things could be deferred, but I think that would just be more
> complicated.
> 
> >> As an additional, but probably excessive measure, invocation of
> >> tc_windup() from hardclock could ask the setclock() to re-execute
> >> tc_windup() after its call is done.
> 
> I don't quite understand this.  hardclock -> tc_windup() normally doesn't
> have interference from tc_setclock().
In what sense it does not have interference ?  tc_windup() may be executed
from hardclock/hardclock_cnt (low half) and concurrently from setclock()
(top half).  And this is exactly what we want to avoid, isn't it ?

> tc_setclock() needs locking even more than tc_windup().
...
> > I now think that the lock around tc_windup() should have more flavor of
> > real spinlocks.  Ideally the spinlock_enter() should be taken around it,
> > but it might be too extreme.  Just a critical section in tc_setclock()
> > is good enough.
> 
> No, it must have a full lock since it does read-write-modify on critical
> variables.
I talked only about tc_windup(). Parallel invocations of tc_setclock()
should be guarded, indeed. I added a private mutex for it.

> 
> > ...
> > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> > index 0f015b3..a5776d2 100644
> > --- a/sys/kern/kern_tc.c
> > +++ b/sys/kern/kern_tc.c
> > ...
> > @@ -1237,6 +1238,7 @@ tc_setclock(struct timespec *ts)
> >     struct timespec tbef, taft;
> >     struct bintime bt, bt2;
> >
> > +   critical_enter();
> >     cpu_tick_calibrate(1);
> >     nanotime(&tbef);
> >     timespec2bintime(ts, &bt);
> > @@ -1247,8 +1249,10 @@ tc_setclock(struct timespec *ts)
> >     bintime2timeval(&bt, &boottime);
> 
> boottime is one of the critical variables.  Before this, there are 2
> accesses to boottimebin.  The Giant locking that you just removed gave
> only some protection here.  It prevented concurrent acccess by
> malicious/stress-testing code doing settime().  It didn't prevent
> nanotime() and friends seeing boottime in an in-between state.  Oops, 
> mutex locking won't fix that either.  I think the update of these
> variables needs to be moved into tc_windup() where it will hopefully
> automatically be protected by the generation count, like the timehands
> offsets already are.
I do not understand how that would help.  Do you mean to move the vars
into current timehands ?

> 
> bootime is also accessed with no locking in the kern.boottime sysctl.
> 
> There is also ffclock_boottime.  I don't like the duplication for
> ffclock, but it doesn't have much duplication for initializing
> ffsclock_boottime.
> 
> >
> >     /* XXX fiddle all the little crinkly bits around the fiords... */
> > -   tc_windup();
> > +   tc_windup(true);
> >     nanotime(&taft);
> > +   cpu_tick_calibrate(1);
> 
> The 2 calls to cpu_tick_calibrate() in this function are excessive.  These
> calls are just a hack to recalibrate after suspend/resume as a side effect
> of calling here.  The first call suffices, since cpu ticks have nothing to
> do with any time variable that is set here.
Ok.

> > +static volatile int tc_windup_lock;
> > +static void
> > +tc_windup(bool top_call)
> > +{
> > +
> > +   for (;;) {
> > +           if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) {
> > +                   atomic_thread_fence_acq();
> > +                   tc_windup_locked();
> > +                   atomic_store_rel_int(&tc_windup_lock, 0);
> > +                   break;
> > +           } else if (!top_call) {
> > +                   break;
> > +           }
> > +   }
> > }
> 
> I like to write optimized locking like that, but don't see why you
> want it here.
Because I do not want the fast clock interrupt handler to wait on lock,
which neccessary becomes spinlock.  The ntp_lock is bad enough already.
What I coded above is not real lock.  In contention case it bails out,
so there is no blocking.  In particular, witness is not relevant to it.

diff --git a/sys/kern/kern_ntptime.c b/sys/kern/kern_ntptime.c
index d352ee7..efc3713 100644
--- a/sys/kern/kern_ntptime.c
+++ b/sys/kern/kern_ntptime.c
@@ -162,29 +162,12 @@ static l_fp time_adj;                     /* tick adjust 
(ns/s) */
 
 static int64_t time_adjtime;           /* correction from adjtime(2) (usec) */
 
-static struct mtx ntpadj_lock;
-MTX_SYSINIT(ntpadj, &ntpadj_lock, "ntpadj",
-#ifdef PPS_SYNC
-    MTX_SPIN
-#else
-    MTX_DEF
-#endif
-);
+static struct mtx ntp_lock;
+MTX_SYSINIT(ntp, &ntp_lock, "ntp", MTX_SPIN);
 
-/*
- * When PPS_SYNC is defined, hardpps() function is provided which can
- * be legitimately called from interrupt filters.  Due to this, use
- * spinlock for ntptime state protection, otherwise sleepable mutex is
- * adequate.
- */
-#ifdef PPS_SYNC
-#define        NTPADJ_LOCK()           mtx_lock_spin(&ntpadj_lock)
-#define        NTPADJ_UNLOCK()         mtx_unlock_spin(&ntpadj_lock)
-#else
-#define        NTPADJ_LOCK()           mtx_lock(&ntpadj_lock)
-#define        NTPADJ_UNLOCK()         mtx_unlock(&ntpadj_lock)
-#endif
-#define        NTPADJ_ASSERT_LOCKED()  mtx_assert(&ntpadj_lock, MA_OWNED)
+#define        NTP_LOCK()              mtx_lock_spin(&ntp_lock)
+#define        NTP_UNLOCK()            mtx_unlock_spin(&ntp_lock)
+#define        NTP_ASSERT_LOCKED()     mtx_assert(&ntp_lock, MA_OWNED)
 
 #ifdef PPS_SYNC
 /*
@@ -271,7 +254,7 @@ ntp_gettime1(struct ntptimeval *ntvp)
 {
        struct timespec atv;    /* nanosecond time */
 
-       NTPADJ_ASSERT_LOCKED();
+       NTP_ASSERT_LOCKED();
 
        nanotime(&atv);
        ntvp->time.tv_sec = atv.tv_sec;
@@ -302,9 +285,9 @@ sys_ntp_gettime(struct thread *td, struct ntp_gettime_args 
*uap)
 {      
        struct ntptimeval ntv;
 
-       NTPADJ_LOCK();
+       NTP_LOCK();
        ntp_gettime1(&ntv);
-       NTPADJ_UNLOCK();
+       NTP_UNLOCK();
 
        td->td_retval[0] = ntv.time_state;
        return (copyout(&ntv, uap->ntvp, sizeof(ntv)));
@@ -315,9 +298,9 @@ ntp_sysctl(SYSCTL_HANDLER_ARGS)
 {
        struct ntptimeval ntv;  /* temporary structure */
 
-       NTPADJ_LOCK();
+       NTP_LOCK();
        ntp_gettime1(&ntv);
-       NTPADJ_UNLOCK();
+       NTP_UNLOCK();
 
        return (sysctl_handle_opaque(oidp, &ntv, sizeof(ntv), req));
 }
@@ -382,7 +365,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args 
*uap)
                error = priv_check(td, PRIV_NTP_ADJTIME);
        if (error != 0)
                return (error);
-       NTPADJ_LOCK();
+       NTP_LOCK();
        if (modes & MOD_MAXERROR)
                time_maxerror = ntv.maxerror;
        if (modes & MOD_ESTERROR)
@@ -484,7 +467,7 @@ sys_ntp_adjtime(struct thread *td, struct ntp_adjtime_args 
*uap)
        ntv.stbcnt = pps_stbcnt;
 #endif /* PPS_SYNC */
        retval = ntp_is_time_error(time_status) ? TIME_ERROR : time_state;
-       NTPADJ_UNLOCK();
+       NTP_UNLOCK();
 
        error = copyout((caddr_t)&ntv, (caddr_t)uap->tp, sizeof(ntv));
        if (error == 0)
@@ -506,6 +489,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec)
        int tickrate;
        l_fp ftemp;             /* 32/64-bit temporary */
 
+       NTP_LOCK();
+
        /*
         * On rollover of the second both the nanosecond and microsecond
         * clocks are updated and the state machine cranked as
@@ -627,6 +612,8 @@ ntp_update_second(int64_t *adjustment, time_t *newsec)
        else
                time_status &= ~STA_PPSSIGNAL;
 #endif /* PPS_SYNC */
+
+       NTP_UNLOCK();
 }
 
 /*
@@ -690,7 +677,7 @@ hardupdate(offset)
        long mtemp;
        l_fp ftemp;
 
-       NTPADJ_ASSERT_LOCKED();
+       NTP_ASSERT_LOCKED();
 
        /*
         * Select how the phase is to be controlled and from which
@@ -772,7 +759,7 @@ hardpps(tsp, nsec)
        long u_sec, u_nsec, v_nsec; /* temps */
        l_fp ftemp;
 
-       NTPADJ_LOCK();
+       NTP_LOCK();
 
        /*
         * The signal is first processed by a range gate and frequency
@@ -956,7 +943,7 @@ hardpps(tsp, nsec)
                time_freq = pps_freq;
 
 out:
-       NTPADJ_UNLOCK();
+       NTP_UNLOCK();
 }
 #endif /* PPS_SYNC */
 
@@ -999,11 +986,11 @@ kern_adjtime(struct thread *td, struct timeval *delta, 
struct timeval *olddelta)
                        return (error);
                ltw = (int64_t)delta->tv_sec * 1000000 + delta->tv_usec;
        }
-       NTPADJ_LOCK();
+       NTP_LOCK();
        ltr = time_adjtime;
        if (delta != NULL)
                time_adjtime = ltw;
-       NTPADJ_UNLOCK();
+       NTP_UNLOCK();
        if (olddelta != NULL) {
                atv.tv_sec = ltr / 1000000;
                atv.tv_usec = ltr % 1000000;
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 0f015b3..7bb767c 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -135,7 +135,8 @@ SYSCTL_PROC(_kern_timecounter, OID_AUTO, alloweddeviation,
 
 static int tc_chosen;  /* Non-zero if a specific tc was chosen via sysctl. */
 
-static void tc_windup(void);
+static void tc_windup(bool top_call);
+static void tc_windup_locked(void);
 static void cpu_tick_calibrate(int);
 
 void dtrace_getnanotime(struct timespec *tsp);
@@ -1226,10 +1227,12 @@ tc_getfrequency(void)
        return (timehands->th_counter->tc_frequency);
 }
 
+static struct mtx tc_setclock_mtx;
+MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF);
+
 /*
  * Step our concept of UTC.  This is done by modifying our estimate of
  * when we booted.
- * XXX: not locked.
  */
 void
 tc_setclock(struct timespec *ts)
@@ -1237,6 +1240,8 @@ tc_setclock(struct timespec *ts)
        struct timespec tbef, taft;
        struct bintime bt, bt2;
 
+       mtx_lock(&tc_setclock_mtx);
+       critical_enter();
        cpu_tick_calibrate(1);
        nanotime(&tbef);
        timespec2bintime(ts, &bt);
@@ -1247,8 +1252,10 @@ tc_setclock(struct timespec *ts)
        bintime2timeval(&bt, &boottime);
 
        /* XXX fiddle all the little crinkly bits around the fiords... */
-       tc_windup();
+       tc_windup(true);
        nanotime(&taft);
+       critical_exit();
+       mtx_unlock(&tc_setclock_mtx);
        if (timestepwarnings) {
                log(LOG_INFO,
                    "Time stepped from %jd.%09ld to %jd.%09ld (%jd.%09ld)\n",
@@ -1256,7 +1263,23 @@ tc_setclock(struct timespec *ts)
                    (intmax_t)taft.tv_sec, taft.tv_nsec,
                    (intmax_t)ts->tv_sec, ts->tv_nsec);
        }
-       cpu_tick_calibrate(1);
+}
+
+static volatile int tc_windup_lock;
+static void
+tc_windup(bool top_call)
+{
+
+       for (;;) {
+               if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) {
+                       atomic_thread_fence_acq();
+                       tc_windup_locked();
+                       atomic_store_rel_int(&tc_windup_lock, 0);
+                       break;
+               } else if (!top_call) {
+                       break;
+               }
+       }
 }
 
 /*
@@ -1265,7 +1288,7 @@ tc_setclock(struct timespec *ts)
  * timecounter and/or do seconds processing in NTP.  Slightly magic.
  */
 static void
-tc_windup(void)
+tc_windup_locked(void)
 {
        struct bintime bt;
        struct timehands *th, *tho;
@@ -1846,7 +1869,7 @@ tc_ticktock(int cnt)
        if (count < tc_tick)
                return;
        count = 0;
-       tc_windup();
+       tc_windup(false);
 }
 
 static void __inline
@@ -1921,7 +1944,7 @@ inittimecounter(void *dummy)
        /* warm up new timecounter (again) and get rolling. */
        (void)timecounter->tc_get_timecount(timecounter);
        (void)timecounter->tc_get_timecount(timecounter);
-       tc_windup();
+       tc_windup(true);
 }
 
 SYSINIT(timecounter, SI_SUB_CLOCKS, SI_ORDER_SECOND, inittimecounter, NULL);
diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c
index 148da2b..82710f7 100644
--- a/sys/kern/kern_time.c
+++ b/sys/kern/kern_time.c
@@ -115,9 +115,7 @@ settime(struct thread *td, struct timeval *tv)
        struct timeval delta, tv1, tv2;
        static struct timeval maxtime, laststep;
        struct timespec ts;
-       int s;
 
-       s = splclock();
        microtime(&tv1);
        delta = *tv;
        timevalsub(&delta, &tv1);
@@ -147,10 +145,8 @@ settime(struct thread *td, struct timeval *tv)
                                printf("Time adjustment clamped to -1 
second\n");
                        }
                } else {
-                       if (tv1.tv_sec == laststep.tv_sec) {
-                               splx(s);
+                       if (tv1.tv_sec == laststep.tv_sec)
                                return (EPERM);
-                       }
                        if (delta.tv_sec > 1) {
                                tv->tv_sec = tv1.tv_sec + 1;
                                printf("Time adjustment clamped to +1 
second\n");
@@ -161,10 +157,8 @@ settime(struct thread *td, struct timeval *tv)
 
        ts.tv_sec = tv->tv_sec;
        ts.tv_nsec = tv->tv_usec * 1000;
-       mtx_lock(&Giant);
        tc_setclock(&ts);
        resettodr();
-       mtx_unlock(&Giant);
        return (0);
 }
 
diff --git a/sys/kern/subr_rtc.c b/sys/kern/subr_rtc.c
index dbad36d..4bac324 100644
--- a/sys/kern/subr_rtc.c
+++ b/sys/kern/subr_rtc.c
@@ -172,11 +172,11 @@ resettodr(void)
        if (disable_rtc_set || clock_dev == NULL)
                return;
 
-       mtx_lock(&resettodr_lock);
        getnanotime(&ts);
        timespecadd(&ts, &clock_adj);
        ts.tv_sec -= utc_offset();
        /* XXX: We should really set all registered RTCs */
+       mtx_lock(&resettodr_lock);
        error = CLOCK_SETTIME(clock_dev, &ts);
        mtx_unlock(&resettodr_lock);
        if (error != 0)
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r302252 - head/sys/kern

Reply via email to