Hi Patrick, This is interesting. I just realized that the problem is not exclusive of my anycast servers. I noticed that my authoritative-only servers were not returning the ADDITIONAL section either, so I restarted BIND, and they started doing so.
So this does look more clearly like some kind of bug in BIND. I'll try to open a case with ISC. Thanks for your reply. cv On Thu, May 19, 2011 at 11:49 AM, Patrick, Robert (CONTR) <robert.patr...@hq.doe.gov> wrote: > Carlos, > > I've observed the same behavior with BIND 9.8.0 running on generic IPv4 > assigned to an Ethernet interface, not using loopback with AnyCast. Odds are > good this is a software bug in BIND. Same behavior observed on two nearly > identical platforms, while on two others I've not run into the same issues. > > Best I could determine, the problem became apparent after some duration of > runtime and/or queries or query volume. On servers that only handle inside > "trusted" users I've not seen the problem at all and they're still running > 9.8.0 today. On external Internet-facing servers where the problem was > triggered almost daily we rolled back to 9.7.x until a fix is released (or > 9.8.1, and we'll try again). > > FYI, server O/S in my case is CentOS 5.6 32-bit, should be equivalent to Red > Hat. > > Hopefully an ISC POC will contact you directly. Send configs and they'll > probably assist in debugging. > > -----Original Message----- > From: dns-operations-boun...@lists.dns-oarc.net > [mailto:dns-operations-boun...@lists.dns-oarc.net] On Behalf Of Carlos Vicente > Sent: Thursday, May 19, 2011 1:58 PM > To: bind-users@lists.isc.org; dns-operati...@lists.dns-oarc.net > Subject: [dns-operations] Bind 9.8.0 intermittent problem with non-recursive > responses > > Dear lists [apologies if you receive two copies of this message], > > I am in the process of implementing anycast recursive DNS service for > our campus using a combination of servers running Bind 9.8.0 and Cisco's > IP SLA feature. There are three identical Redhat servers connected to > three different routers with point-to-point /30 links. The servers are > configured with an anycast address attached to an alias of the loopback > interface: > > [note: these are not the actual IP addresses] > > lo:1 Link encap:Local Loopback > inet addr:192.168.32.32 Mask:255.255.255.255 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > These caching servers are also configured as stealth slaves for our > zones (using Bind's 'also-notify' option in our master). This allows us > to serve the latest contents of our zones without having to wait for > TTLs to expire. > > In our tests, we've come across a very interesting but annoying problem. > After several hours of operation, the servers start to respond to CNAME > queries in an inconsistent manner. For example: > > # dig @192.168.32.32 www.uoregon.edu > > ; <<>> DiG 9.8.0-RedHat-9.8.0-4.uopel5 <<>> @192.168.32.32 www.uoregon.edu > ; (1 server found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14280 > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 6, ADDITIONAL: 4 > > ;; QUESTION SECTION: > ;www.uoregon.edu. IN A > > ;; ANSWER SECTION: > www.uoregon.edu. 600 IN CNAME uowc-www.uoregon.edu. > uowc-www.uoregon.edu. 86400 IN A 192.168.142.125 > > ;; AUTHORITY SECTION: > uoregon.edu. 86400 IN NS phloem.uoregon.edu. > uoregon.edu. 86400 IN NS bigdog.lsu.edu. > uoregon.edu. 86400 IN NS sns-pb.isc.org. > uoregon.edu. 86400 IN NS arizona.edu. > uoregon.edu. 86400 IN NS ruminant.uoregon.edu. > uoregon.edu. 86400 IN NS dns.cs.uoregon.edu. > > ;; ADDITIONAL SECTION: > phloem.uoregon.edu. 86400 IN A 192.168.32.35 > phloem.uoregon.edu. 86400 IN AAAA 2001:468:d01:20::80df:2023 > ruminant.uoregon.edu. 86400 IN A 192.168.60.22 > ruminant.uoregon.edu. 86400 IN AAAA 2001:468:d01:3c::80df:3c16 > > ;; Query time: 0 msec > ;; SERVER: 192.168.32.32#53(192.168.32.32) > ;; WHEN: Wed May 18 12:51:06 2011 > ;; MSG SIZE rcvd: 300 > > > # dig @192.168.32.32 www.uoregon.edu > > ; <<>> DiG 9.8.0-RedHat-9.8.0-4.uopel5 <<>> @192.168.32.32 www.uoregon.edu > ; (1 server found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34776 > ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;www.uoregon.edu. IN A > > ;; ANSWER SECTION: > www.uoregon.edu. 600 IN CNAME uowc-www.uoregon.edu. > > > As you can see, the second response does not include the AUTHORITY or > the ADDITIONAL sections. This causes our users' machines to fail > to resolve the A records because the resolver library does not query a > second time. This second type of response appears to be the server > acting as an authoritative-only server, not as a caching recursive server. > > Here are the most interesting details: > > - We have only observed this happening when querying the anycast > address, not the address associated with the ethernet interface. > - The behavior is independent of the network. We can replicate it by > querying the anycast address from the server itself. > - Our production (non-anycast) servers run the exact same version of > Bind with the exact same configuration, and we have never observed this > problem. > - Bind's debugging output is exactly the same in both cases, so > it offers no clues about the difference in responses. > - Restarting Bind, the problem goes away for several hours. It requires > the server to receive query traffic during those hours, otherwise the > problem does not happen. > > Here's the options section of the config: > > options { > version "9999.9.9"; > recursive-clients 5000; > directory "/etc/named"; > allow-transfer { none; }; > blackhole { attackers; }; > listen-on-v6 { any; }; > allow-recursion { customers; }; > allow-query { any; }; > dnssec-enable yes; > dnssec-validation yes; > > }; > > > Bind is listening on the anycast address (in addition to its NIC IP > address): > > # netstat -lnp |grep 192.168.32.32 > tcp 0 0 192.168.32.32:53 0.0.0.0:* > LISTEN 30771/named > udp 0 0 192.168.32.32:53 0.0.0.0:* > 30771/named > > These are the details of our Bind daemon (custom-built RPM, based on > Fedora's source RPM): > > # named -V > BIND 9.8.0-RedHat-9.8.0-4.uopel5 built with > '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' > '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' > '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' > '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' > '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' > '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' > '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' > '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' > '--disable-openssl-version-check' '--enable-exportlib' > '--with-export-libdir=/usr/lib64' > '--with-export-includedir=/usr/include' > '--includedir=/usr/include/bind9' 'build_alias=x86_64-redhat-linux-gnu' > 'host_alias=x86_64-redhat-linux-gnu' > 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= > -DDIG_SIGCHASE' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 > -mtune=generic' 'FFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 > -mtune=generic' > using OpenSSL version: OpenSSL 0.9.8e-rhel5 01 Jul 2008 > using libxml2 version: 2.6.26 > > # uname -a > Linux adns1 2.6.18-238.9.1.el5 #1 SMP Fri Mar 18 12:42:39 EDT 2011 > x86_64 x86_64 x86_64 GNU/Linux > > # cat /etc/redhat-release > Red Hat Enterprise Linux Server release 5.6 (Tikanga) > > > I would really appreciate any help with this. > > Thanks in advance, > _______________________________________________ > dns-operations mailing list > dns-operati...@lists.dns-oarc.net > https://lists.dns-oarc.net/mailman/listinfo/dns-operations > dns-jobs mailing list > https://lists.dns-oarc.net/mailman/listinfo/dns-jobs > _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users