Hal Murray <hmur...@megapathdsl.net>: > > e...@thyrsus.com said: > > Wait, then I have failed to undersrtaand your bug report. This can happen > > in a different, less odd way than the nonce update getting lost by packet > > drop? > > Yes. The no-packet-lost case is broken if it needs a second batch. > > Each batch gets a new nonce. The code doesn't do anything with it. So > asking for the second batch is using a stale nonce.
*Each* batch? I'm looking at the ntp_control.c code. I don't see how this is possible. I looked for CTL_OP_REQ_NONCE, and it looks like the only time a response of that type is shipped is when the client requests one. The Python client-side code thinks it should request a nonce at the beginning of the fetch and every four span requests thereafter. I went back and re-checked the C code to make sure I hadn't mistranslated this. I hadn't. What I do now see is that nonces are supposed to age out after 16 seconds. (ntpd/ntp_control.c, line 3054 at the end of the validate_nonce() function) The ntpd side is not counting them at all. Right there I see a problem... > > Or we could just write a script using the Python Mode 6 library to flood a > > running ntp with bogus Mode 6 packets. That way we wouldn't have to add > > cruft in C. > > Just generating crap won't help. You need to forge the source IP Address. > (I think you could do it semi-cleanly by setting up a bazillion extra IP > Addresses on your driver. I forget what they are called.) I think we have a protocol issue to solve first. It looks like the client code was *never* properly matched to the server-side. It only happened to work if 4 requests could always be processed within 16 seconds. Maybe this accounts for Sanjeev's bug, #206 on the tracker. > > What is failing to work exactly? > > Currently, ntpq dies as soon as it asks for the second batch. > > I've seen it ask for a new nonce, but that didn't recover. I didn't > investigate since that was the same time I saw that it wasn't picking up the > new nonce. > > The old ntpq doesn't work either. It used to work before the traffic jump. > I assume something got pushed over the edge. The obvious thing is slots > getting updated faster then they can be retrieved. I think we need to add a > bunch of counters but I don't know where to put them. I wish I'd known that sooner. I've been beating myself up trying to figure out what I could have gotten wrong in the Python translation. I think I need to fix the aging code in the client first. Then let's see what the transaction looks like. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel