Hi, Bert: bert hubert wrote: > Paul, I've written many many TCP/IP reassemblers and in fact the overhead is > trivial. Your kernel does it all the time for example. The trick is to have > a limited window in which you do the reassembly, and not scan over the > entire file. Neither does a kernel.
Having QA'd IDSes in a past life, I don't disagree that the overhead, in terms of memory and CPU, ought to be minimal. However, the implementation complexity of a production grade TCP stream reassembler is high enough and the environment unforgiving enough that I'd prefer to hand the task off to a bullet-proof stand-alone library implementation. The last time I went looking for such an implementation I came up empty, but I'd love to be proven wrong. > Having said all that, it doesn't mean we aren't big fans of logging. But > people I know are also big fans of logging being separate from their > production servers, and this implies packets & reassembly. This is why we > have ample tooling in powerdns-tools to analyze packets. > > Packets also have the wonderful advantage that they represent what actually > happened and not what the nameserver THOUGHT that happened. We've for > example been able to solve & debug many issues caused by malformed packets. > Such malformed packets would probably not retained their unique malformation > by serialization to dnstap. > > As another example, we've in the past had cases where our own logging showed > we were serving answers with low latency yet packet traces showed > signigicant time between query and response packets. The ultimate issued > turned out to be queueing before our listening socket. Once we *got* the > packet we answered it quickly enough. But we did not (and could not easily) > account for when the packet hit the server. > > Our tool 'dnsscope' shows such statistics wonderfully. I agree with you that in many cases being able to know "what actually happened" on the network vs what the DNS software thought had happened is quite handy, and I don't see packet capture as a technology being displaced for those cases when you want to get at the network-level artifacts. (I should note that dnstap will happily serialize malformed DNS *messages* [e.g., say some DNS record data is encoded incorrectly], but malformed *packets* are out-of-scope [e.g., say some middlebox corrupts a fragmented EDNS response and the receiver's kernel discards the packets instead of passing them to the nameserver process].) There are a lot of great use cases for DNS packet capture that can show network-level malfeasance (here I take an expansive view of "network-level" that includes everything after the initiator send()'s and the responder recv()'s) that will be awkward or impossible to replicate with an in-server logging facility like dnstap. Those use cases aren't what I'd like to focus on with dnstap. It's a nice bonus that the in-server approach obviates the need to condition the input by extracting DNS payload content from the lower layer frames (reassembling IP fragments, TCP streams, etc.), but that's not the primary reason I started working on the dnstap idea, however. The original, motivating use case for dnstap is passive DNS replication, and specifically the kind of hardened passive DNS replication that we implemented at Farsight (well, originally at ISC). It's worth quoting from Florian Weimer's original passive DNS paper on the "hardening" difficulties: Most DNS communication is transmitted using UDP. The only protection against blindly spoofed answers is a 16 bit message ID embedded in the DNS packet header, and the number of the client port (which is often 53 in inter-server traffic). What is worse, the answer itself contains insufficient data to determine if the sender is actually authorized to provide data for the zone in question. In order to solve this problem, resolvers have to carefully validate all DNS data they receive, otherwise forged data can enter their caches. ("Passive DNS Replication" § 3.3, "Verification") There are two interrelated issues here that Florian left to future implementers: + "[B]lindly spoofed [UDP] answers". We solved this in the capture component of our passive DNS system ("dnsqr") by keeping a table of outstanding UDP queries and doing full RFC 5452 (hi Bert!) § 9.1 style matching of the corresponding responses. + "[T]he answer itself contains insufficient data to determine if the sender is actually authorized to provide data for the zone in question." This is trickier; basically there is nothing internal to the contents of a standalone DNS query/response transaction that allows us to evaluate the trustworthiness of the authority and additional sections of the response message. (For instance, if you see a query/response for the question name "www.example.com", may the authority section specify NS records for "example.com"?) The tack we took for this problem is to passively build a giant cache of NS and A/AAAA records (bootstrapped from the root zone), and work downwards from there based on the responses logged by our capture component. There are obvious scaling problems with this approach. This latter problem is unwieldy enough to do with passive packet capture, especially when you are aggregating the responses from many recursive servers (as we are), that it'd be highly desireable to be able to obviate it somehow. And there is: if we can modify the recursive DNS implementation (and this is a big if), we can have the DNS server log the cache-miss response and annotate it with the "bailiwick" domain for the transaction. This is enough information that we can elide the large, stateful bailiwick reconstruction cache of the passive packet capture approach. We have a working patchset for Unbound implementing this idea and I know that it's possible with BIND. There are other use cases where it'd nice to be able to avoid resorting to packet capture. For instance, virtually all of the "DNS looking glass" implementations I've seen do some sort of munging of the DNS message content into text/JSON/HTML/etc. Ideally it'd be possible to have the option of passing along the original verbatim DNS response message content. (I think the RIPE Atlas DNS probe currently comes closest to this ideal. IIRC, there is a way to extract the original DNS message byte sequence, but I believe it's a base64-encoded payload inside a JSON document, or something like that.) Another closely related use case is actually being able to save a trace of the DNS message(s) sent/received by debugging tools like dig, kdig, drill, delv, etc. IMO, it's inconvenient enough setting up a packet capture tool running alongside the query tool (needs root, needs to include DNS packet traffic initiated by the query tool but exclude any other incidental DNS traffic that may be captured, may need to scrub IP header addresses from your local network if you want to share the capture, etc. etc.) in order to save a proper "archival quality" copy of the message data that people rarely do this; what you get instead is usually a copy-paste of the "dig-style" output generated by these tools in most cases. And you end up with more-or-less pointless differences between the output formats of these tools, like, to pick an example at random, the trailing metadata that these tools generate, which might look like ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Mon Jul 07 14:35:57 EDT 2014 ;; MSG SIZE rcvd: 239 or ;; Received 239 B ;; Time 2014-07-07 14:35:54 EDT ;; From 127.0.0.1@53(UDP) in 0.2 ms depending on the whims of the vendor who produced the tool you're using. CZ.NIC's kdig has working support for being able to export the query/response messages in dnstap format and to generate display output from the messages saved to a dnstap file, and I hope to be able to extend the debugging tools from other vendors to be able to similarly handle dnstap files. Plain old query logging at scale will probably best be done by packet capture for the forseeable future, unless you'd like to be able to export information that doesn't appear on the wire (e.g., whether a query was served from cache or not), in which case something like dnstap might be a good fit. Certainly I'd like to have the DNS resolver on my home network be able to generate good logs "for free" out of the box much like your typical HTTP server (apache/nginx/etc.) comes properly configured to log accesses. However, what I don't think the future involves is hanging some more %s's off of a big printf() style format string like: ns_client_log(client, NS_LOGCATEGORY_QUERIES, NS_LOGMODULE_QUERY, level, "query: %s %s %s %s%s%s%s%s%s (%s)", namebuf, classname, typename, WANTRECURSION(client) ? "+" : "-", (client->signer != NULL) ? "S": "", (client->opt != NULL) ? "E" : "", ((client->attributes & NS_CLIENTATTR_TCP) != 0) ? "T" : "", ((extflags & DNS_MESSAGEEXTFLAG_DO) != 0) ? "D" : "", ((flags & DNS_MESSAGEFLAG_CD) != 0) ? "C" : "", onbuf); (Not to pick on BIND/ISC specifically here, but I had the function handy.) > It so happens that we now have the infrastructure to plug in arbitrary > modules at packet entry & exit, we could perhaps do a dnstap implementation > there. Will keep you posted. This is great news; in general I think a lot of people would like to see more "hook"-ability like this from DNS software. (Unbound's module stacks are quite interesting and I originally wanted to implement dnstap in Unbound as an Unbound module, but I wasn't able to get it to work out, unfortunately.) -- Robert Edmonds _______________________________________________ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs