> It's...hm...maybe a good way to put it is that the structure of the NTPsec
> state space and sync algorithms is extremely hostile to testing.

I still don't have a good understanding of why TESTFRAME didn't work.  I can't 
explain it to somebody.

We've got
  code mutations
  hidden variables in the FSM
  hostile

So what makes it hostile?  Is it more than just complexity?

Why isn't this sort of testing even more valuable when things get complex?

---------

> If you try to do this kind of eyeballing in NTPsec it will make your brain
> hurt.  It's not just that the input and output packets are binary, that's
> superficial and fixable with textualization tools I can write in my sleep.
> Fine, let's say you've done that. You've got an interleaved stream of input
> and output timestamps.  How do you reason through the sync algorithms to know
> whether the relationships are correct? 

How do we tell that it is working without TESTFRAME?  I eyeball ntpq -p and/or 
graphs of loopstats and friends.  That's using the stats files as a summary of 
the internal state.

Did TESTFRAME capture the stats files?

With a bit more logging, we could probably log enough data so that it would be 
possible to do the manual verification of what is going on.  We would have to 
write a memo explaining how it works, maybe that would include chunks of pseudo 
code.

How much of the problem is that Eric didn't/doesn't understand the way the 
inner parts of ntpd work?  I've read the descriptions many times but I still 
don't understand it well enough to explain it to somebody.  Maybe I could work 
up a presentation with the code in one hand and the descriptions in the other 
hand.  It would take a while.  That is, I know the general idea and recognize 
all the pieces but don't have a good feel for how the pieces fit together to 
make up the big picture.


> Not only are there time-dependent hidden inputs to the computation from the
> kernel clock and PLL, but they're going to be qualitatively different
> depending on whether you have an adjtimex or not.

There wasn't supposed to be anything hidden.  TESTFRAME was supposed to 
intercept all the relevant calls like getting the time from the kernel.

I'm pretty sure we gave up on systems that don't support adjtimex.  OpenBSD 
doesn't have it, but does have enough to slew the clock.  We dropped support 
for OpenBSD when that shim was removed.

------------

How far did you get with TESTFRAME?  Do you remember why you decided to give 
up?  Was there something in particular, or did you just get tired of banging 
your head against the wall?

How many lines of code went away when you removed it?

Would it be interesting for me to take a try?  Now isn't a good time and there 
may be more important things to work on, but I think we should explore and 
understand this option.

------------

But back to the big picture.  How can we test corner cases?

Is it reasonable to look for patterns in the log file?

Is it reasonable to look for patterns in the output of ntpq -p?  Graphs?

When you do a Go port, what can you do to make testing easier?


-- 
These are my opinions.  I hate spam.



_______________________________________________
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Reply via email to