Thanks for digging in so fast. Our mitigation will be sticking to 9.9.6-P1, since we like ESV anyway.
Wanted to point out that (perhaps sadly) this isn't so crazypants...or at least not uncommon. The *edge* and *aka* references speak Akamai DNS+CDN. From my last overview, this has gotten cleaner in the latest versions of their offerings -- but many of the large(est) sites on the Internet will be configured this way today. -----Original Message----- From: Evan Hunt <e...@isc.org> Date: Tuesday, December 9, 2014 at 2:41 PM To: Stuart Henderson <s...@spacehopper.org> Cc: Tony Finch <d...@dotat.at>, "bind-users@lists.isc.org" <bind-users@lists.isc.org> Subject: Re: Problem with BIND 9.10.1-P1 recursion limits >On Tue, Dec 09, 2014 at 05:51:58PM +0000, Evan Hunt wrote: >> That's unexpected. I'll see if I can reproduce it. > >Okay, I can. > >Part of the problem is the somewhat crazypants DNS configuration >of www.ibm.com: > > $ dig +noall +answer www.ibm.com > www.ibm.com. 3600 IN CNAME www.ibm.com.cs186.net. > www.ibm.com.cs186.net. 60 IN CNAME >china-cdn.san.ibm.com.edgekey.net. > china-cdn.san.ibm.com.edgekey.net. 21600 IN CNAME >china-cdn.san.ibm.com.edgekey.net.globalredir.akadns.net. > china-cdn.san.ibm.com.edgekey.net.globalredir.akadns.net. 900 IN CNAME >e7826.x.akamaiedge.net. > e7826.x.akamaiedge.net. 20 IN A 23.59.201.136 > >... like, *wow*. A chain of five aliases with TTLs ranging from 20 >seconds to 6 hours, passing through five different zones (ibm.com, >cs186.net, edgekey.net, akadns.net, akamaiedge.net), hosted by >servers in three *more* zones (ihost.com, akam.net, and akadns.org, >in addition to akadns.net and akamaiedge.net). I had to almost >double the maximum recursion queries to 99 to get this to work on >an empty cache. Yikes. > >Almost any non-empty cache will dodge the bullet. Preceeding the >lookup of www.ibm.com with "dig @::1 ns com" causes the query to >succeed. Also, as previously noted, on 9.9 it will succeed without >a five-minute delay if you just issue the query a second time. > >So, possible workarounds if this issue is causing problems for you: > > - Ensure that the first query sent to a newly-primed recursive > resolver isn't quite as spectacular as this one; > - Add "max-recursion-queries 100;" to your options statement; > - Run 9.9.6-P1 instead of 9.10.1-P1 > >The five-minute delay is still a bit of a puzzle. It happens because >of this code in adb.c: > > /* XXXMLG Don't pound on bad servers. */ > if (address_type == DNS_ADBFIND_INET) { > name->expire_v4 = ISC_MIN(name->expire_v4, now + 300); > name->fetch_err = FIND_ERR_FAILURE; > inc_stats(adb, dns_resstatscounter_gluefetchv4fail); > } else { > name->expire_v6 = ISC_MIN(name->expire_v6, now + 300); > name->fetch6_err = FIND_ERR_FAILURE; > inc_stats(adb, dns_resstatscounter_gluefetchv6fail); > } > >The "now + 300" bit is where the five minutes comes from. That's code >that's been around for years, and it is in 9.9, but apparently it's >reached more easily in 9.10. I'm looking into the reasons for this. > >The problem should be addressed in 9.10.2, which is likely to be >released next month. > >-- >Evan Hunt -- e...@isc.org >Internet Systems Consortium, Inc. >_______________________________________________ >Please visit https://lists.isc.org/mailman/listinfo/bind-users to >unsubscribe from this list > >bind-users mailing list >bind-users@lists.isc.org >https://lists.isc.org/mailman/listinfo/bind-users _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users