Thank you Eric. Have read, am pondering, and welcome other people to weigh in.
..m On Tue, Jun 28, 2016 at 8:30 PM Eric S. Raymond <e...@thyrsus.com> wrote: > In recent discussion of the removal of memlock, Hal Murray said > "Consider ntpd running on an old system that is mostly lightly loaded > and doesn't have a lot of memory." > > By doing this, he caused me to realize that I have not been explicit > about some of the assumptions behind my technical strategy. I'm now > going to try to remedy that. This should have one of three results: > > (a) We all develop a meeting of of the minds. > > (b) Somebody gives me technical reasons to change those assumptions. > > (c) Mark tells me there are political/marketing reasons to change them. > > So here goes... > > One of the very first decisions we made early last year was to code to a > modern API - full POSIX and C99. This was only partly a move for ensuring > portability; mainly I wanted a principled reason (one we could give > potential > users and allies) for ditching all the cruft in the codebase from the > big-iron > era. > > Even then I had clearly in mind the idea that the most effective > attack we could make on the security and assurance problem was to > ditch as much weight as possible. Hence the project motto: "Perfection > is achieved, not when there is nothing more to add, but when there is > nothing left to take away." > > There is certainly a sense in which my ignorance of the codebase and > application domain forced this approach on me. What else *could* I > have done but prune and refactor, using software-engineering skills > relatively independent of the problem domain, until I understood enough > to do something else? > > And note that we really only reached "understood enough" last week > when I did magic-number elimination and the new refclock directive. > It took a year because *it took a year!*. (My failure to deliver > TESTFRAME so far has to be understood as trying for too much in the > absence of sufficient acquired knowledge.) > > But I also had from the beginning reasons for believing, or at least > betting, that the most drastic possible reduction in attack surface > would have been the right path to better security even if the state of > my knowledge had allowed alternatives. C. A. R. Hoare: "There are two > ways of constructing a software design: One way is to make it so > simple that there are obviously no deficiencies, and the other way is > to make it so complicated that there are no obvious deficiencies. > > So, simplify simplify simplify and cut cut cut... > > I went all-in on this strategy. Thus the constant code excisions over > the last year and the relative lack of attention to NTP Classic bug > reports. I did so knowing that there were these associated risks: (1) > I'd cut something I shouldn't, actual function that a lot of potential > customers really needed, or (2) the code had intrinsic flaws that would > make it impossible to secure even with as much reduction in attack surface > and internal complexity as I could engineer, or (3) my skills and intuition > simply weren't up to the job of cutting everything that needed to be cut > without causing horrible, subtle breakage in the process. > > (OK, I didn't actually worry that much about 3 compared to 1 and 2 - I > know how good I am. But any prudent person would have to give it a > nonzero probability. I figured Case 1 was probably manageable with good > version-control practice. Case 2 was the one that made me lose some > sleep.) > > This bet could have failed. It could have been the a priori *right* > bet on the odds and still failed because the Dread God Finagle > pissed in our soup. The success of the project at its declared > objectives was riding on it. And for most of the last year that was a > constant worry in the back of my mind. *What if I was wrong?* What I > was like the drunk in that old joke, looking for his keys under the > streetlamp when he's dropped then two darkened streets over because > "Offisher, this is where I can see". > > It didn't really help with that worry that I didn't know *anyone* I > was sure I'd give better odds at succeeding at this strategy than > me. Keith Packard, maybe. Poul Henning-Kemp, maybe, if he'd give up > timed for the effort, which he wouldn't. Recently I learned that Steve > Summit might have been a good bet. But some problems are just too > hard, and this codebase was *gnarly*. Might be any of us would have > failed. > > And then...and then, earlier this year, CVEs started issuing that we > dodged because I had cut out their freaking attack surface before we > knew there was a bug! This actually became a regular thing, with the > percentage of dodged bullets increasing over time. > > Personally, this came as a vast and unutterable relief. But, > entertaining narrative hooks aside, this was reality rewarding my > primary strategy for the project. > > So, when I make technical decisions about how to fix problems, one of > the main biases I bring in is favoring whatever path will allow me > to cut the most code. > > On small excisions (like removing memory locking, or yet another > ancient refclock driver) I'm willing to trade a nonzero risk that > removing code will break some marginal use cases, in part because I am > reasonably confident of my ability to revert said small excisions. We > remove it, someone yells, I revert it, no problem. > > So don't think I'm being casual when I do this. What I'm really doing > is exploiting how good modern version control is. The kind of tools > we now have for spelunking code histories give us options we didn't > have in elder days. Though of course there's a limit to this sort of > thing. It would be impractical to restore mode 7 at this point. > > Now let's talk about hardware spread and why, pace Hal, I don't really > care about old, low-memory systems and am willing to accept a fairly high > risk of breaking on them in order to cut out complexity. > > The key word here is "old". I do care a lot about *new* low-memory > systems, like the RasPis in the test farm. GPSD taught me to always > keep an eye towards the embedded space, and I have found that the > resulting pressure to do things in lean and simple ways is valuable > even when designing and implementing for larger systems. > > So what's the difference? There are a couple of relevant ones. One > is that new "low-memory" systems are actually pretty unconstrained > compared to the old ones, memory-wise. The difference between (say) a > 386 and the ARM 7 in my Pis or the Snapdragon in my smartphone is > vast, and the worst-case working set of ntpd is pretty piddling stuff > by modern standards. Looking at the output of size(1) and thinking > about the size of struct peer my guess was that it would be running > with about 0.8GB of RAM, and top(1) on one of my Pis seems to confirm > this. > > Another is that disk access is orders of magnitude faster than it > used to be, and ubiquitous SSDs are making it faster yet. Many > of the new embedded systems (see: smartphones) don't have spinning > rust at all. > > What this means in design terms is that with one single exception, > old-school hacks to constrain memory usage, stack size, volume > of filesystem usage, and so forth - all those made sense on > those old systems but are almost dead weight even on something > as low-end as a Pi. The one exception is that if you have an > algorithmic flaw that causes your data set to grow without bound > you're screwed either way. > > But aside from that, the second that resource management becomes a > complexity and defect source, it should be dumped. This extends from > dropping mlockall() all the way up to using a GC-enabled language like > Python rather than C whenever possible. Not for nothing am I planning > to at some point scrap ntpq in C to redo it in Python. > > Now, as to *why* I don't care about old low-power systems - it's > because the only people who are going to run time service on them are > a minority of hobbyists. A minority, I say, because going forward > most of the hobbyists interested in that end of things are going to be > on Pis or Beaglebones or ODroids so they can have modern toolchains > thank you. > > Let's get real, here. The users we're really chasing are large data > centers and cloud services, because that's where the money (and > potential funding) is. As long as we don't make algorithmic mistakes > that blow up our big-O, memory and I/O are not going to be performance > problems for their class of hardware in about any conceivable > scenario. > > Here's what this means to me: if I can buy a complexity reduction (and > thus a security gain) by worrying less about how the resulting code > will perform on machines from before the 64-bit transition of 2007-2008, > you damn betcha I will do it and sleep the sleep of the just that > night. > > When all is said and done, we could outright *break* on hardware that > old and I wouldn't care much. Unless somebody is paying us to care and > I get a cut, in which case I will cheerfully haul out my shovels and > rakes and implements of destruction and fix it, and odds are high > we'll end up with better code than we inherited. > > Yeah, it's nice to squeeze performance out of old hardware, and it's > functional to be sparing of resources. But when everything in both > our security objectives and our experience says "cut more code" I'm > going to put that first. > > This is how I will proceed until someone persuades me otherwise or > our PM directs me otherwise. > -- > <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> > > _______________________________________________ > devel mailing list > devel@ntpsec.org > http://lists.ntpsec.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel