Mark: Heads up! Your input requested. Hal and I were having an off-list argument. It started here:
Hal: But maybe we should have a separate thread per refclock to get the time stamp. ... Me: Not a crazy idea, but I lean against it. My problem is that threading is a serious defect attractor (as we have in fact seen in the async-DNS code). I'd take a good deal of persuading about the expected gain in performance before I'd accept the implied increase in expected defect rates. Recording and graphing some timing figure that we think is an upper bound on the gains would be helpful here. If the median, or even the two-sigma worst case, is high compared to our accuracy target, then threading might be indicated. Otherwise I think it would be best to leave well enough alone. The crude metric that occurs to me first is just interval between select calls. If you think a different timing figure would be a better predictor, I'm very open to argument. Hal: I'm slightly scared about your extrapolation from measurements. (Both here and in other cases.) I think it's pretty easy to see that the problem will depend upon load. So the only question is how much load is your system going to have? That's one of the things we can't easily predict. It will get (much) worse if security gets deployed and uses serious cycles. Me: Ever had one of those moments when you realize your priors have shifted, in a way that surprises you, while you weren't looking? I'm having one now. What I've just realized, somewhat to my own startlement, is that I no longer care enough about high-load scenarios to spend a lot of effort hedging the design against them. It's because of the Pis on my windowsill - I've grown used to thinking of NTP as something you throw on a cheap single-use server so it's not contending with a conventional job load. Go ahead and argue that I'm wrong, if you like. But make that argument acknowledging that anybody who cares about accurate time can spend just $80 on a Pi and a HAT, and what's the point of heavily complicating the NTPsec code to handle high load when *that's* true? *blink* I did not actually realize I believed this until just a few minutes ago now. It's ever so slightly disorienting. Hal: I think you should run that past Mark. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A Pi with GPS HAT is a nice toy. I see two problems. One is that the NIST servers are already overloaded. Things will get worse if security uses more cycles. "Sorry" might be the right answer, but I think it requires a good story. The other is that it doesn't fit into a typical server room or office environment. There is too much EMI in a server room. You need either to locate the Pi and antenna someplace where the antenna will see the satellites and there isn't too much EMI, or you need a long coax connection so the Pi can live in the machine room and get connected to an antenna on the roof. A Pi isn't packaged for typical server room mechanicals. (Just the power via USB would probably get it laughed out of most places.) (History ends here.) Me: I think over-fixating on the Pi's limitations is a mistake. And the EMI issue is orthogonal to whether you expect your time service to be running on a lightly or heavily-loaded machine - either way you're going to need to pipe your GPS signal down from a roof mount. The underlying point is that blade and rack servers are cheap. Cycles are cheap. This gives the option of implicitly saying to operators "high-load conditions are *your* problem - fix it by rehosting your NTP" rather than doing what I think would be premature optimization for the high-load case. If we jump right in and implement threading we *are* going to pay for it in increased future defect rates. My preference is to engineer on the assumption that local cycles are cheap and we can therefore stick with the simple dataflow that we have now - synchronous I/O with one worker thread to avoid stalls on DNS lookups. I'd prefer to bias towards architectural simplicity unless and until field reports force us to optimize for the high-load case. This turns into a strategic question that I can't answer. Given Mark's plans and LF's objectives, what *should* we assume? Are NIST's overloaded servers (and installations like them) our target? -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel