Roland Dobbins wrote: > I think dnstap is a very good idea; still, it would be helpful to understand > why it wasn't implemented in IPFIX, rather than in a custom telemetry format > . . .
We did not frame the evaluation in terms of selecting (or building) a particular "telemetry format". Instead we focused on some finer grained functional areas where we knew we would have to build or select particular components: 1) The "dnstap" idea entails modifying existing DNS servers, adding inline payload logging capabilities to the fast path of the DNS server. Performance is a key consideration, and we would prefer to have the capability to, under high load, drop excess logging payloads rather than block the server from making progress at its real job of returning answers to clients. So we need some sort of asynchronously-processed circular queue that can offload as much of this work from the DNS server's critical path. 2) A way of encoding the log payload from the DNS server's internal, in-memory representation, to a serialized byte sequence that can be transported over something like a socket or to a file. (The "encoding".) 3) A way of actually transporting the serialized log payload to a receiver over something like a socket or file. (The "transport".) I don't believe IPFIX has much to offer for #1, since this is an overly specific (yet quite important) implementation detail. We ended up writing our own lockless memory-barrier based circular buffer implementation, based on a technique used in the Linux kernel: https://www.kernel.org/doc/Documentation/circular-buffers.txt and then placing this in a library for re-use in different applications. If you combine #1 and #3 above and allow them to be implemented in a single package, one obvious contender is ZeroMQ; ultimately I think ZeroMQ is not that great of a choice for embedding *directly* in DNS servers for a few different reasons: e.g., there are several different versions (the Debian archive offers ZeroMQ major versions 2.x, 3.x, and 4.x) and the compatibility guarantees are somewhat convoluted. So we did not select ZeroMQ for use in the DNS server-side component. But I didn't want to preclude the possibility of re-sending dnstap payloads over binary-clean transports that are transparent to payload content like ZeroMQ, hence the "transport/encoding" split between #2 and #3. It looks like maintaining the #2/#3 "transport/encoding" split with IPFIX is impossible; it appears IPFIX is tightly coupled to the IP transport protocol: there is an IPFIX-over-UDP, IPFIX-over-TCP, IPFIX-over-SCTP... What if you want to send payloads over an AF_UNIX socket, or via an HTTP(S) GET/POST, WebSockets connection, some new technology that hasn't been invented yet, etc.? Enforcing a firm separation between a generic lower-level "transport" and a specific upper-level "encoding" is something that worked out pretty well for us in a different context: http://www.caida.org/workshops/isc-caida/1210/slides/isc1210_redmonds.html I say "appears" above because my next complaint is that there are too many specifications documents for IPFIX. There are several dozen listed here: https://datatracker.ietf.org/wg/ipfix/documents/ This is in contrast to generic serialization systems for structured data like Protocol Buffers, Thrift, Apache Avro, MessagePack, Cap'n Proto, BSON, etc. etc. Most of these can be described in a single fairly succinct document each; IPFIX appears to encompass a lot more than just serialization of structured data and consequently has a much larger specification footprint. If IPFIX is well-suited for applications other than representing IP flows, it is awfully hard to tell from the outside without plowing through a ton of specifications. This is itself a downside; we have to convince not just ourselves, but DNS software vendors to import this code and DNS software users that they might want to use this code. For a dnstap file format I was awfully tempted to use the traditional pcap-savefile(5) format with a new linktype, but pcap has a hard 64K frame size limit, which would make it impossible to represent dnstap payloads with maximally sized DNS messages in a single frame, which I wanted to make a hard requirement for dnstap. I tried to find the analogous limit for IPFIX, which appears to also use a 16-bit field to represent message length. (Possibly IPFIX can split payloads across multiple messages, but if it can, this is not readily apparent, and we would prefer not to have to invoke such a capability anyway.) Also, I found the following blog post rather interesting: http://www.ntop.org/nprobe/why-nprobejsonzmq-instead-of-native-sflownetflow-support-in-ntopng/ The fact that not even flow probe vendors are happy with IPFIX is somewhat telling. I do not know enough about flow probes to evaluate most of his very specific technical complaints with IPFIX, but something like JSON or protobufs paired with ZeroMQ is a fairly reasonable solution for a wide variety of use cases. So, sorry we didn't pick IPFIX. It just doesn't look like a good fit for what we want to make possible, and there are a lot of general purpose technologies out there that I would consider first before considering IPFIX for a particular application. -- Robert Edmonds _______________________________________________ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs