On Wed, Jun 3, 2015 at 2:04 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * John Stultz <john.stu...@linaro.org> wrote: > >> > Instead of having these super rare special events, how about implementing >> > leap >> > second smearing instead? That's far less radical and a lot easier to test >> > as >> > well, as it's a continuous mechanism. It will also confuse user-space a lot >> > less, because there are no sudden time jumps. >> >> So yea. Leap smearing/slewing is an attractive solution. The first issue is >> that >> there's no standard yet for the range of time that the slew occurs (or even >> if >> the slew is linear or a curve). The second is I don't think we can actually >> get >> away from supporting UTC w/ leap, as applications may depend on precision. >> Also >> things like NTP sync w/ mixed systems would be problematic, as NTPd and >> others >> need to become savvy of which mode they are working with. > > Supporting it minimally is fine - supporting it with clearly unmaintainable > complexity is not. > > So as long as we offer good smearing of the leap second (with a configurable > parameter for how long the period should be), people in need of better leap > second handling can take that. > >> The leap smearing method of only doing it in private networks and >> controlling it >> by the NTP server is becoming more widespread, but it has its own problems, >> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't >> yet >> frequency steerable separately from the other clockids, this method ends up >> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well. > > All real time clock derived clocks should smear in sync as well.
Eeerrr.. So CLOCK_TAI is UTC without leapseconds, to smear TAI would be wrong. Similarly, CLOCK_MONOTONIC/BOOTTIME probably shouldn't be smeared either (but those are defined less strictly). >> I'd like to try to get something working in the kernel so we could support >> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow >> applications that care to migrate explicitly to the one they care about. >> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS >> so >> that most applications that don't care can just ignore it. But finding time >> to >> do this has been hard (if anyone is interested in working on it, I'd be >> excited >> to hear!). > > There should definitely be a Kconfig option to just map all relevant clocks to > smeared seconds. Hopefully this ends up being the standard in a few years and > we > can pin down the exact parameters as well. > > Having separate clockids for mixed uses would be fine as well. Maybe. > >> But if you think this patch is complicated, creating a new separately steered >> clockid is not going to be trvial (as there will be lots of ugly edge cases, >> like what if a leap second is cancelled mid-way through the slewing >> adjustment, >> etc). > > Well, I think the main advantage of leap second smearing is that it's not a > binary, but a continuous interface, and so it's way easier to test than > 'sudden' > leap second insertions. > > In fact we could essentially implement leap second smearing via the usual > adjtimex > mechanisms: as far as the time code is concerned it does not matter why a > gradual > adjustment occurs, only the rate of change and the method of convergence is an > open parameter. > > In fact I'd suggest we implement even original leap seconds by doing a > high-rate > 'smearing' in the final X minutes leading up to the leap second, where 'X' > could > be 1 by default. This way we could eliminate leap seconds as a separate > logical > entity mostly. > > This should be far more gentle to applications as well than sudden jumps, and > timers will just work fine as well. Well, again the problem with high-rate smearing as you describe is that it would affect CLOCK_MONOTONIC as well, which could cause periodic timers used for sampling, etc (imagine recording audio, etc) to slow as well, possibly causing application problems. This is why the smeared leap-seconds are usually done across a day at a slow rate. To allow for CLOCK_REALTIME to be frequency adjusted separately from CLOCK_MONOTONIC/CLOCK_TAI, which would would have the least unwanted side-effects, we're probably going to have to manage it separately (like we do w/ MONOTONIC_RAW time). But again, this creates a lot more complexity. >> > Secondly, why is there a directional flag? I thought leap seconds can only >> > be >> > inserted. >> >> A leap delete isn't likely to occur, but its supported by the adjtimex >> interface. And given the irregularity of the earths rotation, I'm not sure >> I'd >> rule it out completely. > > Well, the long term trend is clear and unambiguous: the rotation of Earth is > slowing down (the main component of which is losing angular momentum to the > Moon), > hence the days are getting longer and we have to insert a leap second every > second > year or so. > > The short term trends (discounting massive asteorid strikes, at which point > leap > seconds will be the least of our problems) are somewhat chaotic: > > - glaciation (which shifts water mass assymetrically) > > - global warming (one component of which is thermal expansion, which expands > oceans assymetrically and shifts water mass - the other component is > changing > climatology: different oceanic currents, etc. - which all shift mass > around) > > - tectonics (slow rearrangement of mass plus earthquakes). > > - even slower scale rearrangement of mass (mantle plumes, etc.) > > but the long term trend still dominates. Look at this graph of measurements > of the > Earth's rotation: > > http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg > > See how the mean (the green line) was always above zero in the measured past. > The > monotonically increasing nature comes from that. > > and given how many problems we had with leap second insertion, on millions of > installed systems, guess the likelihood of there being a leap second deleted? > How > many OSs that can do leap second insertion are unable to do leap second > deletion? > > Also note that leap second deletion means a jump in time backward. Daylight > saving > time is already causing problems with that. Err.. Other way around. Leap-second deletion is a jump in time forward (jumping from 23:59:58 to 00:00:00, skipping 23:59:59). Which is simpler to deal with. And luckily (at least for us) daylight savings is done in userspace (as UTC, including leapseconds, ideally would be from the kernel providing TAI time). But yes, I agree that the leap deletion logic is likely to never run outside of testing. >> > So all in one, the leap second code is fragile and complex - lets re-think >> > the >> > whole topic instead of complicating it even more ... >> >> So the core complexity with this patch is that we're basically having to do >> state-machine transitions in a read-only path (since the reads may happen >> before >> the update path runs). Since there's a number of read-paths, there's some >> duplication, and in some cases variance if the read path exports more state >> (ie: >> adjtimex). > > My fundamental observation is: the cost/benefit ratio is insanely high. I agree. In a perfect world, the kernel would export TAI not UTC, leaving the translation to UTC to userspace (take heed developers of new IoT OSes!). But the trouble is that historical posix/linux provides UTC (without a leapsecond representation, which is why we have to repeat a second). And as more folks (userspace developers, not really kernel developers) are caring about strict UTC correctness around the leapsecond, its hard to rationalize avoiding the complexity (since they don't really care, they just don't want to deal with anything unexpected in their application). > Interrupts are fundamentally jittery, there's no guarantee of their accuracy > - you > yourself said that as a reply to PeterZ's suggestion to drive leap seconds via > hrtimers - and the motivation was to make interrupts arrive more accurately > around > leap seconds. > > So why make the code more fragile, more complex, just to solve a scenario that > cannot really be done perfectly? So here I worry I didn't communicate clearly enough what the patch does. :( Its not about making interrupts more accurate around the leapsecond, its about applying the leapsecond transition in the read-path precisely at the leapsecond edge (rather then a short while later when the timer fires and we update the timekeeping structures). But more importantly, this change to the read path prevents timers that may be expired before update_wall_time timer runs (most likely on other cpus) from being expired early. Since the time read that is used by the hrtimer expiration logic is adjusted properly right on that edge. > Especially as second smearing appears to be the way superior future method of > handling leap seconds. > So here the problem is it depends on the user. For probably most users, who really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I think leap-smears causing any change to other clockids would be surprising). However, there are some users who expect posix UTC leapsecond behavior. Either because they're positioning telescopes doing things that do depend on strict solar time, or because they are required (in some cases by law) to use UTC. I don't think we can just abandon/break those users, for leap-smearing. So I don't know if we can get away from that complexity. But maybe I'm not thinking "boldly" here. >> I do agree that the complexity of the time subsystem is getting hard to >> manage. > > That's rather an understatement. > >> I'm at the point where I think we need to avoid keeping duplicated timespec >> and >> ktime_t data (we can leave the ktime->timespec caching to the VDSOs). That >> will >> help cut down the read paths a bit, but will also simplify updates since >> we'll >> have less data to keep in sync. How we manage the ntp state also needs a >> rework, since the locking rules are getting too complex (bit me in an earlier >> version of this patch), and we're in effect duplicating some of that state in >> the timekeeper with this patch to handle the reads safely. > > Agreed. > >> But even assuming all those changes were already made, I think we'd still >> need >> something close to this patch. > > I disagree rather strongly. I do really appreciate the review and thoughts here, and respect and share your concern about complexity, but I'm not yet seeing a viable path forward with your proposals above. So additional ideas or clarifications would be welcome. So, I think with this push back, we're unlikely to have a solution that will be deploy-able by the leap second at the end of the month (though the issue was reported late enough that getting something merged/backported/deployed in mass wasn't super realistic). So we'll get to hear how much folks actually care about this issue. Since the leap is a discontinuity, and there is no way to set a ABS_TIME CLOCK_REALTIME timer for the 23:59:60 leap second, having a few very early timers targeted for the next second expire early on that repeated second is probably not a major issue in practice. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/