On Thu, 30 Jun 2016, Konstantin Belousov wrote:

On Thu, Jun 30, 2016 at 05:47:39AM +1000, Bruce Evans wrote:
On Wed, 29 Jun 2016, Konstantin Belousov wrote:

On Wed, Jun 29, 2016 at 05:54:43PM +0300, Konstantin Belousov wrote:
This is a reply to two mails in one, both for r302251 and r302252 threads.

After removing the foot-shooting tz_minuteswest variable from utc_offset(),
all of the variables in it are ones set by adjkerntz sysctls.  The settings
are normally ordered by not running more than 1 instance of adjkerntz.
When adjkerntz starts, this instance shouldn't have any contention with
other initialization doing resettodr().  There is contention every 6
months if wall_cmos_clock when adjkerntz changes the offset for daylight
savings time.  The chances of this used to be tiny since resettodr() was
almost never called, but now there is the periodic resettodr() in
kern_ntptime.c.
I do not see how your text above changes anything I said about consistency.
Each sysctl sets or fetches integer variable, so individually it is atomic.
Inter-dependencies between wall_cmos_clock and adjkerntz settings are
completely up to user, no locking can change it while keeping separate
sysctls to set and fetch each var individually.

The only side-effects in resettodr() are possible in CLOCK_SETTIME(),
which is protected against parallel invocation by resettodr_lock.

It says that the users are sufficiently limited and careful about ordering
for the problem to be small.

...
The change in the other part of the diff is related to this, and doesn't
seem quite right.  utc_offset() uses the variables tz_minuteswest,
wall_cmos_clock and adjkerntz:
- tz_minuteswest is set with no locking in kern_settimeofday(), just
   after the settime() call.  This last had a chance of being correct
   when the kernel was UP and not preemptible.  But it is broken in
   a more fundamental way back to at least FreeBSD-1: resettodr() is
   called _before_ updating the timezone, so it uses an out-of-date
   tz_minuteswest.
First, the man page is not correct, saying that 'this information is
kept outside the kernel'.

We deprecate the timezone support in settimeofday(), but use wrong wording
for this here.  The info set by this sysctl is of course kept in the kernel,
and I think the man page is trying to say that this info is never used
outside of the kernel.
How the later statement could be true, since that information is
returned by gettimeofday(), and we do not control what programs a user
might run on his machine.

Indeed.  The man page is only trying to say that this info is not used by
libc (and therefore should not be used by any program).

*...
Second, why does settime(), logicall, need the updated timezone ?
I understand that resettodr() is called on successful update of
the wall clock.  But shouldn't the need to reset not-GMT RTC clock

The timezone updates just give foot-shooting, but we need an up to
date adjkerntz variable initially and on every daylight saving change
in the wall_cmos_clock case.  This is the normal way to keep the rtc
on wall time after every daylight saving changes -- adjkerntz() runs,
and it should change the adjkerntz variable first and then call
settimeofday() with a "null" change to get the side effect of calling
resettodr() which does a large change of the rtc.  It is impossible
to make a really null change of the software clock using settimeofday(),
but I don't know of any better API to do this.  I think the locking that
you just added is enough -- adjkerntz() just has to arrange a call to
resettodr() strictly after it updates the variable, and that happens
almost automatically.
I think that the way it is done is just by doing sysctl machdep.adjkerntz,
which calls resettodr().

I probably knew that once, but had forgotten it.

Old wisdom to re-sync RTC to kernel-maintained
time was to run the sysctl from cron, but Andrey added the callout
several years ago.

I never knew that old wisdom, and just avoided re-syncing the RTC.

I do not think that inittodr() actually needs that locking.  This code
is executed at the initialization stage, or at the resume stage.

Perhaps, but resume runs more often than every 6 months for the daylight
savings time race, and it isn't clear if anything stops resettodr() running
concurrently then.

In my version, something like inittodr() runs 3 seconds after every exit
from ddb.  Stopping in ddb is a bit like a short suspend-resume -- it
stops or breaks some timers, and inittodr() is a way to fix up the
software real time using an unstopped timer.
Lets postpone this.  Might be, a resettodr_lock can be applied to
inittodr() as a whole, and additionally merged with some other lock.
BUt I think inittodr() is not very important right now.

I see some more things that need the common lock:
- the RTC interrupt handler.  resettodr() begins by disabling RTC
  interrupts.  This is to prevent interference from the interrupt
  handler, but it only works for UP systems.  For SMP systems, the
  interrupt handler might already be running, or scheduled to run,
  on another CPU.  The interrupt handler doesn't do many RTC accesses
  (except in my version), but it does delicate interrupt re-enabling
  by reading the volatile status register.  resettodr() ends by
  re-enabling RTC interrupts and reading the volatile status register.
  I think all orderings work, but it would be simpler to not allow the
  contention or maybe to allow it intentionally by not disabling RTC
  interrupts.
- atc_restore().  This runs mainly for resume.  It resets the same
  status register as resettodr() and another status registers.  It
  must not be allowed to run concurrently with resettodr().  Resume
  call it before allowing any callouts.  It is unclear if this is
  guaranteed.
*...
So I postponed looking at tc_windup(), and it seems that it was
useful. After you note above, it is mostly clear that we can skip some
tc_windup() calls which conflicts with currently active windup calls.
E.g., the hardclock update, which is done simultaneously with the
setclock(), could be just skipped.  But we must not skip the tc_windup()
call from setclock().

Several things could be deferred, but I think that would just be more
complicated.

As an additional, but probably excessive measure, invocation of
tc_windup() from hardclock could ask the setclock() to re-execute
tc_windup() after its call is done.

I don't quite understand this.  hardclock -> tc_windup() normally doesn't
have interference from tc_setclock().
In what sense it does not have interference ?  tc_windup() may be executed
from hardclock/hardclock_cnt (low half) and concurrently from setclock()
(top half).  And this is exactly what we want to avoid, isn't it ?

Ues, but you seemed to be saying that hardclock should schedule or call
tc_setclock().  That makes no sense.  So I thought that you meant
tc_setclock() calling or scheduling tc_windup() in a different way than
it does now (it just calls it now).

...
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 0f015b3..a5776d2 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
...
@@ -1237,6 +1238,7 @@ tc_setclock(struct timespec *ts)
        struct timespec tbef, taft;
        struct bintime bt, bt2;

+       critical_enter();
        cpu_tick_calibrate(1);
        nanotime(&tbef);
        timespec2bintime(ts, &bt);
@@ -1247,8 +1249,10 @@ tc_setclock(struct timespec *ts)
        bintime2timeval(&bt, &boottime);

boottime is one of the critical variables.  Before this, there are 2
accesses to boottimebin.  The Giant locking that you just removed gave
only some protection here.  It prevented concurrent acccess by
malicious/stress-testing code doing settime().  It didn't prevent
nanotime() and friends seeing boottime in an in-between state.  Oops,
mutex locking won't fix that either.  I think the update of these
variables needs to be moved into tc_windup() where it will hopefully
automatically be protected by the generation count, like the timehands
offsets already are.
I do not understand how that would help.  Do you mean to move the vars
into current timehands ?

I hoped that it would just work without putting it in timehands, but
now I don't see a better way that putting it in timehands.  The space
wastage is not too bad, and we might need it in timehands for other
reasons.  One is that the actual boot time is invariant, but FreeBSD
varies it; a global is needed for the actual boot time and then it is
natural to put the fudged boot time in timehands (it is not a boot
time, but an offset which is sometimes close to the boot time).

Note that boottimebin is also updated in tc_windup() (for leap seconds).
The new mutex doesn't protect it there.  This is the next of other
reasons to put it in timehands.

*...
+       for (;;) {
+               if (atomic_cmpset_int(&tc_windup_lock, 0, 1)) {
+                       atomic_thread_fence_acq();
+                       tc_windup_locked();
+                       atomic_store_rel_int(&tc_windup_lock, 0);
+                       break;
+               } else if (!top_call) {
+                       break;
+               }
+       }
}

I like to write optimized locking like that, but don't see why you
want it here.
Because I do not want the fast clock interrupt handler to wait on lock,
which neccessary becomes spinlock.  The ntp_lock is bad enough already.
What I coded above is not real lock.  In contention case it bails out,
so there is no blocking.  In particular, witness is not relevant to it.

I think this is too complicated.  Spinlocks aren't very expensive, and
hardclock() already has some.  I think it has several.  It has at least
one silly one -- in hardclock_cpu(), a thread locking to lock a single
flags update.  hardclock() and tc_windup() just aren't called enough
for the locking overhead or contention to matter, even at too-large hz.

...
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 0f015b3..7bb767c 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
...
+static struct mtx tc_setclock_mtx;
+MTX_SYSINIT(tc_setclock_init, &tc_setclock_mtx, "tcsetc", MTX_DEF);

I thought it would be a spinlock.  The sleep lock is better if you can
make ot work, but this is complicated.

+
/*
 * Step our concept of UTC.  This is done by modifying our estimate of
 * when we booted.
- * XXX: not locked.
 */

Now "not properly locked" -).

void
tc_setclock(struct timespec *ts)
@@ -1237,6 +1240,8 @@ tc_setclock(struct timespec *ts)
        struct timespec tbef, taft;
        struct bintime bt, bt2;

+       mtx_lock(&tc_setclock_mtx);
+       critical_enter();

I thought at first that this critical section was redundant, since a
spinlock would already have it.

        cpu_tick_calibrate(1);
        nanotime(&tbef);
        timespec2bintime(ts, &bt);
@@ -1247,8 +1252,10 @@ tc_setclock(struct timespec *ts)
        bintime2timeval(&bt, &boottime);

The leap second update to boottimebin in tc_windup() races with the
update to boottimebin here.  This can be fixed partially by using the
same locking there.  Then the locking would have to be a spin mutex
(or special).  This still doesn't fix non-atomic accesses to boottimebin
in nanotime() and friends.  Putting it in timehands seems to be best for
fixing that.

boottime is easier to fix (and doesn't need to be in timehands).  It
is only used to copy it out in a historical sysctl.  The locking for
that is buggy.  We convert boottimebin to boottime here (now with locking)
but in the sysctl we copy out boottime non-atomically.


Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to