Hi, fellow BIND users. The other day I was attempting to diagnose a problem on a recursive resolving name server. I had just enabled DNSSEC Validation, and certain digs (such as "www.isc.org", "www.dnssec-failed.org") were failing. Even queries to non-signed domains such my own personal domain (which also happens to be in .org) were failing.
I was testing it with this command line: dig +dnssec +bufsize=XXXX www.isc.org. a Where XXXX is our locally-configured edns-udp-size value. This DNS lookup took a long time to finish (it was timing out), and then eventually failed with a timeout error. I tried to see where it was failing with: dig +dnssec +bufsize=XXXX @127.0.0.1 www.isc.org. A +trace I got this output before it hung: ========================= ; <<>> DiG 9.8.1-P1 <<>> +dnssec +bufsize=XXXX @127.0.0.1 www.isc.org. A +trace ; (1 server found) ;; global options: +cmd . 506674 IN NS a.root-servers.net. . 506674 IN NS e.root-servers.net. . 506674 IN NS l.root-servers.net. . 506674 IN NS m.root-servers.net. . 506674 IN NS d.root-servers.net. . 506674 IN NS g.root-servers.net. . 506674 IN NS i.root-servers.net. . 506674 IN NS f.root-servers.net. . 506674 IN NS h.root-servers.net. . 506674 IN NS c.root-servers.net. . 506674 IN NS k.root-servers.net. . 506674 IN NS b.root-servers.net. . 506674 IN NS j.root-servers.net. . 506674 IN RRSIG NS 8 0 518400 20120313000000 20120305230000 51201 . kBn5abbR2172kIhOfAdf38Mi4IpqkclowMxD2BKh2hg3udwGeJfK3YOA I1Pz9lcb/NzFzh+ndVXZERaofryyoeE15ZD0HQxMqLai7HV6nVKQyiPZ vGXA3CsIua9g8dnnN4RNbYrPnM7i6f/hBgKph8/AcFHXAQfRFZIxiJL1 O50= ;; Received 397 bytes from 127.0.0.1#53(127.0.0.1) in 817 ms ========================= I ended up spending an hour or two trying to figure out what was causing it to hang, and in the end, this query was hanging: dig +dnssec +bufsize=XXXX @a0.org.afilias-nst.info. org. DNSKEY It turns out that the answer to that query is larger than our "bufsize", so the packet came back truncated, and BIND was re-trying over TCP, but our ACLs weren't set up right to allow that. My request is this: Please add something to "dig" that replicates the behavior of BIND as closely as possible with regards to the many queries it issues as part of a DNSSEC-validing resolution. I ran tcpdump on an unloaded server and captured all DNS query traffic immediately after running "rndc flush", and the queries it asked, in order, were: Remote server Domain Type ======================= ========================= ======= M.ROOT-SERVERS.NET www.dnssec-failed.org. A M.ROOT-SERVERS.NET . NS d0.org.afilias-nst.org www.dnssec-failed.org. A dns105.comcast.net www.dnssec-failed.org. A f.root-servers.net . DNSKEY dns104.comcast.net dnssec-failed.org. DNSKEY a2.org.afilias-nst.info dnssec-failed.org. DS b0.org.afilias-nst.org org. DNSKEY k.root-servers.net org. DS dns101.comcast.net dnssec-failed.org. DNSKEY dns105.comcast.net dnssec-failed.org. DNSKEY dns103.comcast.net dnssec-failed.org. DNSKEY dns102.comcast.net dnssec-failed.org. DNSKEY c0.org.afilias-nst.org dns104.comcast.org. AAAA dns101.comcast.net dns104.comcast.org. AAAA (The edns-udp-size for this server is 4096.) I realize that some of this traffic might be unusual (such as the query for "dig @M.ROOT-SERVERS.NET . NS"), but the rest of it is normal DNSSEC resolution. It would be *extremely helpful* if dig printed out the queries it was doing as it was doing them, so I could have seen that it was re-trying a truncated response, and hanging on TCP. This doesn't even show the fallback-to-TCP that might happen if the edns-udp-size was lower, like in my other location. I understand I'm asking dig to do what BIND normally does, but since they're both packaged together, it seems like a reasonable request. Does anyone else see a need for a tool like this? _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users