Hal Murray <hmur...@megapathdsl.net>: > The traffic on the pool took a big jump recently. There were a couple of > comments on the pool list, but nothing past "lots of traffic". > > Understanding what's going on is the top of my list.
If it matters, I agree with that allocation of your time. NTPsec won't be hurt if it takes a bit longer for unusual use cases of ntpq to be gotten right. The traffic spike is a more prompt problem. > A couple of interesting counters were lost in the recent (month or 2 ago) > work on the packet processing. > > I think the mru list doesn't count mode 6 packets. Fixing that is probably > important in being able to monitor DDoS activity or probes. Hm. Yes, I think that might be true. One problem I had to correct was that during his refactor Daniel moved the ntp_monitor call that logs packets to the wrong place - down a code path most packets never take. Moving it earlier in the processing seemed to solve that problem, but... /me looks. ...yep, you are right. There's now a "Move ntp_monitor() call to where it catches Mode 6 packets." commit. I don't *think* it will have any bad side effects, but the call is now just before the authentication logic rather than just after and it's theoretically possible that ntp_monitor() could do something relevantly bad to the restrict mask on the way through. Eyes other than mine should audit. I've asked Daniel, but he's on vacation. > How much of the old structure of ntpq did you preserve? In the front-end, a lot of it - many of the helper functions even. kept their names. I was nervous about changing anything I didn't understand and that code was pretty messy, so I mostly went for almost *transliterating* C to Python rather than a free translation. I'd say there were just two big structural changes (this is discounting a lot of code that used to be free-standing functions becoming class methods - that shuffled a lot of logic around, but didn't *change* a lot of logic or the overall dataflow). One is the protocol handling getting packed into a class, separated from the front-end logic, and extensively reworked. It got changed more than the front end, because one of my goals was to refactor so as much code should be shared with other clients as possible. That, at least, succeeded big-time; I was able to write the ntpmon proof of concept in about 45 minutes. *However*, that having been said, the lowest level - the logic of request sending and response-fragment reassembly - moved over to the new back end almost unchanged. It resembles the C code it was derived from very closely except for using Python exceptions to bail out of panic cases. The other is that I exploited a happy coincidence. Cosmetically and logically the C-ntpq command interpreter looked a whole *lot* like an instance of a Python library class called cmd.Cmd - actually the resemblance was so strong that I wouldn't be surprised if the ntpq UI were modeled after some ancestral program that the designer of cmd.Cmd was quasi-emulating. (If I had to guess, it was some old-school pre-gdb symbolic debugger, or something of that sort.) One of the major simplifications in the rewrite was throwing out all the logic that cmd.Cmd replaced. > Can you say anything about how the python version of ntpq works that isn't > obvious from looking at the code? I'm looking for the big picture? The > stuff that's obvious after you know it but hard to put together if you don't > know what you are looking for because it is spread over many screens. Thinking... Well, the most important structural thing about it is the layering. The front end, ntpq proper, is mostly one big instance of a class derived from cmd.Cmd. That command interpreter, the Ntpq class, manages an instance of a back-end class called ControlSession that lives in ntp.packet. The cmd.Cmd methods are mostly pretty thin wrappers around calls to (/me counts) eight methods of ControlSession corresponding to each of the implemented Mode 6 request types. Within ControlSession, those methods turn into wrappers around doquery() calls. doquery() encapsulates "send a request, get a response" and includes all the response fragment reassembly, retry, and time-out/panic logic. As I alluded to earlier, that code resembles the old C more than the dispatch layer above it does. Even the code for making the actual displays mostly doesn't live in the front end. It's in ntp.util, well separated from both the command interpreter and the protocol back end so it can be re-used. And is, in fact, re-used by ntpmon. > How does the MRU stuff work? I think I saw some debugging printout > indicating that it got back a clump of packets for each request. If it > misses one, will it use the data up to the gap? The mrulist() method in ControlSession is more complex than the rest of the back end code put together except do_query() itself. It is the one part that was genuinely difficult to write, as opposed to merely having high friction because the C I was translating was so grotty. Yes, the way that part of the protocol works is a loop that does two layers of segment reassembly. The lower layer is the vanilla UDP fragment reassembly encapsulated in do_query() and shared with the other request types. That part I'm pretty confident in; if it didn't work 100%, things like peer listings would break. In order to avoid blocking for long periods of time, and in order to be cleanly interruptible by control-C, the upper layer does a sequence of requests for MRU spans, which are multi-frag sequences of ASCIIizations of MRU records, oldest to newest. The spans include sequence metadata intended to allow you to stitch them together on the fly in O(n) time. A further interesting complication is use of a nonce to foil DDoSes by source-address spoofing. The mrulist() code begins by requesting a nonce from ntpd, which it then replays between span requets to convince ntpd that the address it's firehosing all that MRU data at is the same one that asked for the nonce. To foil replay attacks, the nonce is timed out; you haveto re-request another every 4 span fetches. This is a clever trick and I will certainly use it the next time I need to design a connectionless protocol. But...I never completely understood the old logic for stitching together the MRU spans; it was *nasty* and looked pretty fragile in the presence of span dropouts (I don't know that those can ever happen, but I don't know that they can't, either). Fortunately I didn't have to. It worked just to brute-force the problem - accumulate all the MRU spans until either the protocol marker for the end of the last one or ^C interrupting the span-read loop, and then quicksort the list before handing it up to the front end for display. The old way mode some sense, I guess, back when processor clocks were so expensive that we worked a lot harder to avoid O(n log n) operations. But I can't say I liked that part of the protocol design even a little bit. The answer to your original question is this: I don't really know how well the old code dealt with gaps in the sequence of spans. I think it was intended to cope, but I wouldn't bet *anything* I valued against there being bugs in the coping strategy or implementation. My brute-force method will work better. One consequence of the brute-forcing change is that I never figured out where to put the update-message generation that the C version did. I'm not even completely sure what granularity of update it was counting, nor how that count interacted with the old way of doing stale-record elimination on the fly. (Did I mention that code was nasty?) I would have pushed harder to replicate the old behavior exactly (including the update messages) except that (a) I thought it was ugly, and (b) I already had ntpmon in mind. I know I needed the mrulist() method to run *quietly* rather than assuming as the old code did that it could just belt update messages to the terminal. I'm still not happy about the fact that there's a keyboard-interrupt catcher *inside* the mrulist() method. That feels like a layering violation, but I haven't come up with a better way to partition things. Under the given constraints there may not be one. > Is there any documentation on the packet format? There is. It's described in detail on the docs/mode6.txt page. I was careful about that because I really needed to understand it before I tried translating the ugly C code. > I saw some ".." in an ASCII > packet dump. That was a CR/LF in the hex part. It looked like each slot was > multiple lines. What marks the end of a slot and/or start of a new one? All explained in docs/mode6.txt. If that's at all unclear ask me questions and we'll improve that page. > How does the retransmission logic work? You mean for requests to ntpd? There are two timeouts, five seconds and three seconds. The request is shipped once. If the primary timeout is exceeded without a response coming back, the request is repeated. If the second timeout happens before a response the whole request is aborted. This behavior is direct from the C version and I'm pretty sure the Python implementation is right - the breakage if it weren't would not be subtle. > I want to add a bunch of counters. Where should they go? I'm not sure. It depends on what you want to count. If it's packets with particular mode bits set, or something like that, probably in the ControlSession class. I advise checking your premises before you code anything. Now that the monitor code counts Mode 6 packets again you may be able to get away with a lot less work. > What should I have asked? Dunno. I've tried to give you as complete a brain dump as I can, because you seem to be heading towards extending and maintaining the Mode 6 stuff and I think that's a good idea on several levels. > I got this from an old ntpq looking at a busy server. > > Ctrl-C will stop MRU retrieval and display partial results. > 116 (0 updates) Giving up after 8 restarts from the beginning. > With high-traffic NTP servers, this can occur if the > MRU list is limited to less than about 16 seconds' of > entries. See the 'mru' ntp.conf directive to adjust. > > I think that's trying to tell me that things are getting updated faster than > they are getting retrieved. What will your new code do in that case? The same as the old code, I think. I monkey-copied the logic from C because I was not certain I understood it enough to modify it. Here's what I *think* was and is happening: Each span request except the first is supposed to include identifications of late MRU entries from the previous span. If the daemon can't match those from the MRU records it's holding in core, that means some of the records that existed at the time of the last request have been thrown out of core to make room for newer ones without exceeding the configured limit on MRU memory usage. When this condition occurs, the design assumes you'd rather have a continous traffic record from a later start point than one that has gaps of unknown size in it. So ntpd throws up its hands and starts resending the whole current MRU list. ntpd tracks the number of times it has to do this restart. If that number exceeds 8, it figures that it's never going to get everything to you before stuff ages out, and returns an error code indicating a stall. I see two ways for this to happen: a really low mru limit, or a really slow network. The second condition might have been much more common when this code was written, but I'd be surprised to encounter it now. The stall error code, coming back to C ntpq, is what throws up the error message you saw. In the Python version, you should see this: ***No response, probably high-traffic server with low MRU limit. The difference is that where the C message display for this case was wired right into the MRU-response handler logic, the above is packed inside a class-valued back-end exception that the front end can handle as it likes. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> _______________________________________________ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel