This note attempts to clarify the next set of architectural challenges in the NTPsec code. It is particularly intended for Mark Atwood and Daniel Franke because some of our main goals are tied into supporting NTS.
= OBJECTIVES = Here are the objectives for the next round of work: * NTS: Support Daniel's NTS symbiont. * SINGLESOCK: Move from one socket per interface to a single socket listening on all interfaces. * EVENTS: Move ntpd to an event- and alarm-driven architecture, eliminating the tick-per-second processing. * GOPREP: Clear the path to moving the codebase to Go. We haven't committed to doing this yet, but the odds on that happening someday look high enough that I think it is good to already be factoring it into our planning. Now I'll talk about motivation. NTS is (obviously) a big deal because it would put us well ahead of other time-service implementations in an are a lot of data centers have serious concerns about. There are four different reasons for SINGLESOCK: 1. I want to simplify the ntp_io.c code. I consider the present implementation dangerously ugly and complicated - it's the last really gnarly place in the code after the big cleanup. Our worst remaining platform dependencies are largely concentrated in the fancy dances required to iterate across interfaces and handle routing sockets. Getting rid of all that would be good. 2. Doing SINGLESOCK would make EVENTS easier. That's because after singlesock all the inbound traffic will arrive at one endpoint (one epoll call) rather than at a variable number of endpoints - one per interface - as it does now. That will make it much easier to write and verify a main event loop that handles alarms as well. 3. Going to NTS will move us from paying 1 UDP socket per interface to paying that plus 1 TCP socket per client association - which is potentially a very large number. Daniel argues that this means we want a single epoll on all sockets to avoid scaling badly. In general I think it's unwise to make architectural changes for performance/scaling reasons without measurements proving that the bottleneck is (a) a problem, and (b) where we think it is. But the code-cleanliness arguments for SINGLESOCK and EVENTS are pretty strong, so we might as well collect whatever scaling benefits we get from them and call that good. 4. Returning to code simplification, every simplification helps with GOPREP. This one more so than most because it will dramatically reduce the spread of platform-dependent code paths we have to map to Go if and when it comes time to do that. Reasons for EVENTS: 1. Mark wants to reduce power consumption for deployment in mobile and embedded systems. I understand how this is compromised by tick-per-second and agree with the goal. 2. Daniel says he wants this to make NTS easier. I don't know exactly what it will do for him that SINGLESOCK doesn't. Perhaps he'll explin in a followup. Reasons for GOPREP: Go means (1) no more buffer-overrun attacks, ever, (2) no more memory-leak issues, ever, (3) *vast* code simplification (LOC might easily drop 50%), (4) greatly improved maintainability of what remains. = OBSTACLES = Next, I'm going to describe obstacles. Some of these are quite formidable. NTS: I'm not going to try to scope this yet. I don't understand NTS well enough. We'll need a complexity estimate from Daniel. He does say most of the code will run in a symbiont. SINGLESOCK: While messy and somewhat difficult, this is mostly a SMOP (Simple Matter of Programming). There is one potential technical risk, relatively minor I think. The reason for iterating over interfaces is that ntpd has the capability to block incoming packets by interface of origin. In order to go to a single epoll we either need to (a) abandon this feature, or (b) find a way to query the device a packet came through from the packet. There was some previous discussion of (a). We dropped the idea because I argued that lopping off working features on non-obsolete hardware without a strong security reason was too likely to anger an important contingent of grognard time admins. What I didn't know at the time - didn't find out until months later - is that there is, in fact, a way to do (b). It's part of the RFC 3542 Advanced Sockets API, a query type called IP_PKTINFO that dodn't exist when the main body of the NTP code was written. It is supposed to be portable to FreeBSD at least, and there is example code here: https://stackoverflow.com/questions/603577/how-to-tell-which-interface-the-socket-received-the-message-from The risk is related to the fact that this is a *really* obscure and seldom-used feature. It's difficult to tell from the sockets API documentation that it even exists - I made several serious tries at finding something equivalent and failed before this. Because this feature is so recherche it has probably not been tested a lot, and I do not have maximum confidence in its robustness or actual portability. However, "interface filtering fails because your OS upstream fooed up" is survivable in a way that "we dropped it because it was inconvenient" might not be. The is, however, an easy way to check this. We put in the IP_PKTINFO query *before* changing the interface-iteration code, check each incoming packet, and log a problem if the result from the query doesn't match the interface we know it came in on. We burn this in on all our Tier 1 platforms and see. EVENTS: The code currently has a once-per-second tick that we want to eliminate in favor of alarms that only fire as needed. Unfortunately, this is going to be quite difficult. And we won't collect the major benefit (lower power consumption) until every piece of it is done. In theory, what we ought to be able to do is spin on a single event dispatcher using epoll, waking up only when there's an incoming packet, or when we're due transmit to a peer, or once per fixed time interval of an hour to check leap second status. Presently we unconditionally wake up once per second and do nothing if none of these things is required. There are a couple of problems with this. 1. If we have local refclocks we're going to wake up once per second anyway because that is almost always how often they ship a time report. We've heard that Dr. Mills once turned down a patch set to make the NTP Classic code entirely event-driven, and this is probably most of the reason he didn't think it was valuable enough to risk the code churn - only no-refclock deployments could see a benefit, and Dr. Mills loved his refclocks. In our deployment scenarios, how often do we think a low-power device is *not* going to be watching a GPS/1PPS refclock? Smartphones and tablets are right out - anything mobile with a browser wants to know location, therefore will have a GPS. Given that you get 1PPS for free from an embedded GPS and those are both cheap and ubiquitous, the real question is actually a bit more fundamental. How often does any low-power device care about time without caring about location? The power benefit from trying to go tickless is upper-bounded by the size of this set. 2. There's a subtle issue here with frequency of clock adjustment. Currently if we're slewing the clock it gets adjusted once per second. If we go to a fully event-driven architecture (and there are no refclocks) the frequency of adjustments will drop to the frequency of network traffic. This may not be a practical problem - I'm inclined to think it won't be - but we won't know until we measure. 3. Implementation complexity. The code says * The basic timerevent is one second. This is used to adjust the * system clock in time and frequency, implement the kiss-o'-death * function and the association polling function. but this is a pretty serious oversimplification. If you look at the code in ntp_timer.c there's all kinds of random other stuff being done. Each conditional has its own timer and would have to be unpacked into a different event type on the queue - I see 7 types. This implies a big messy change that will take significant time to do and to verify. 4. On the other hand, after staring at the code I can now report that the 1PPS issue I thought might be going to block EVENTS is...not exactly gone, but subsumed by the "refclocks tick once per second" problem. Because NTP uses the RFC 2783 interface to the PPSAPI kernel facility, no explicit 1PPS tick needs to be dealt with. I got this wrong because GPSD does *not* assume RFC 2783 will be available and in many cases camps on one of the GPS's handshake lines to pick it up. GOPREP: Aside from the considerable labor of code translation, there is only one problem blocking a move to Go. That is how GC stop-the-world pauses might stall refclock reports. What you pay for Go's GC is that the runtime occasionally has to stop all threads to do it. These pauses are not frequent. See for example this 2016 graph of Go 1.9 performance on an 18GB heap: https://twitter.com/brianhatfield/status/804355831080751104 They get serious spikes once or twice an hour, bounded by 1ms and averaging 0.5ms. For network-peer connections this is not a problem - it isn't large compared to random network-weather effects that are exactly what NTP is designed to deal. However, it is a substantial amount of unpredictable latency to impose on local refclock reports. I'm not going to try to plan around this yet except to note that it could well turn into a reason to revive refclockd partitioning (with the clock code staying in C.) = DEPENDENCIES = The logical order to do these things in is: SINGLESOCK first. Then EVENTS, if we choose to do it. Then NTS. GOPREP isn't a task to be scheduled (at least not yet) but a set of issues to keep an eye on. = UNRESOLVED QUESTIONS = I'm now confident that SINGLESOCK is doable fairly promptly. The big question is whether EVENTS is worth the effort. I'm now leaning towards "no", but my mind could be changed if either (a) Mark tells me there's a really important deployment class that is both low power and without refclocks, or (b) Daniel tells me there's some reason NTS really needs things to be rearchitected this way. We need to keep an eye on Go's worst-case stop-the-world pause times, because they keep falling in successive compiler revisions. There's some magic threshold of microseconds-per-hour below which we wouldn't actually care about the induced clock-report jitter; I don't know what that is, but Hal or Gary might be able to tell us. We probably don't want to plan a Go move until we have confidence that the 95th-percentile of stop-the-world pauses is going to below that threshold. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel