Hal Murray <hmur...@megapathdsl.net>: > Your writeup focuses on code mutations rather than state space. (Or maybe I > didn't read what you intended.)
Perhaps I could have been clearer. But those two problems run together in my mind becauase of what unifies them and contrasts with the GPSD and reposurgeon cases. It's...hm...maybe a good way to put it is that the structure of the NTPsec state space and sync algorithms is extremely hostile to testing. In reposurgeon, when I want to test a command it's generally not too difficult to hand-craft a repository with the relevant features, run the command, look at the output repo and verify that the transformation is as expected. In GPSD one of the things that makes the test suite work so well is that by eyeballing a check file you can actually see the correctness of the relationship between a sentence burst fom the GPS and the JSON it's transformed into. If you try to do this kind of eyeballing in NTPsec it will make your brain hurt. It's not just that the input and output packets are binary, that's superficial and fixable with textualization tools I can write in my sleep. Fine, let's say you've done that. You've got an interleaved stream of input and output timestamps. How do you reason through the sync algorithms to know whether the relationships are correct? Not only are there time-dependent hidden inputs to the computation from the kernel clock and PLL, but they're going to be qualitatively different depending on whether you have an adjtimex or not. Yes, sure, you can expose all those inputs in your test loads, but now what you have is a mess of numbers related through algorithms with an intractably large number of moving parts. Every bit of state retained between packet-handler invocations is a moving part. So is every configuration option. And there's no smoothness in the test space. In reposurgeon or GPSD I can take an existing test, modify it, and often know with reasonable certainty how the code path traversed will change and whee the new check file will differ. In NTPsec it's nonlinearities and edge triggers all the way down. What I found out after writing almost all the mechanics of TESTFRAME is that once you have it you slam into a wall that better tooling is zero help with. There's no way to get enough of a causal handle on what's going on to be sure you can test even simple features like outlier clipping. General verification of correctness is *completely* out of reach; the best you can do is test for (a) same-input-same-output stability and (b) try to cover enough code paths to smoke out the core dumps. In forty-odd years of software engineering I've never seen another testing problem this wicked. I don't really expect to see one if I live another forty. > The "known to be interesting" phrase gets back to my query that > started this thread. I'm looking for a way to test corner cases. > Would TESTFRAME would have done that? Given a set of inputs that triggers a corner case, yes. The problem is *how do you compose that input load?* There are simple cases in when you can do this. An obvious one is writing perverse configurations to try to crash ntpd on startup. The problem is that those aren't *interesting* cases - testing them would be like looking for your car keys under a streetlight because the light is good even though you dropped them in the dark two blocks over. > If we don't like TESTFRAME, what else can we do? In principle? Fuzz-probe for core dumps. *With* TESTFRAME, we could test for same-input/same-output stability. And that's about it. I've had four years to think about this problem. If there were a way over, under, or around it I do not think it would have taken me that long to spot it. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel