Brian J. Murrell <br...@interlinx.bc.ca> wrote: > On Thu, 2018-01-18 at 15:41 +0000, Tony Finch wrote: > > > > The default is 10 minutes - try reducing it and see if the outage > > becomes shorter. > > If it does, what is that telling me?
My hypothesis here is that `named` has marked all the nameservers for the domain that is failing as lame, so it no longer has anywhere to send queries for the domain, so it returns a SERVFAIL. The address database dump confirms this guess, so there isn't any need to fiddle with the lame-ttl unless you want to double check. > > When you have a failure, try `rndc flushtree` to more selectively drop > > problematic state - you might have to find out the nameservers of the > > broken domain and flush them. (The google.com nameservers are under > > google.com; GitHub's are under dynect.net and a bunch of awsdns > > domains.) > > rndc flushtree takes a domain name though doesn't it? In what case > would I need to find nameservers? The idea is to flush the state needed to resolve queries for the domain, so as well as flushing the domain itself, you also need to flush its nameservers - easy for Google, harder for GitHub. > So, when I do rndc reload am I flushing the cache? :-( No, a reload will (in almost all cases) retain the cache - though it might clear other state (I have not checked exactly what). I'm a bit surprised it fixes your problem; maybe the address database gets flushed on a reload. > ; Address database dump > ... > ; ns3.google.com [v4 TTL 7] [v6 TTL 7] [v4 failure] [v6 failure] > ; ns2.google.com [v4 TTL 7] [v6 TTL 7] [v4 failure] [v6 failure] > ; ns1.google.com [v4 TTL 7] [v6 TTL 7] [v4 failure] [v6 failure] > ; ns4.google.com [v4 TTL 7] [v6 TTL 7] [v4 failure] [v6 failure] OK, here's a very smoky gun. I think this suggests that you have some kind of connectivity problem between your DNS server and Google's (etc) - you should check that large fragmented EDNS responses get through OK, and that TCP works OK, and that you don't have pMTUd problems. > > and servfail cache. > > Non-existent section in my database dump. Ah, the servfail cache is another 9.11 feature. Tony. -- f.anthony.n.finch <d...@dotat.at> http://dotat.at/ - I xn--zr8h punycode Hebrides, Bailey, Fair Isle, Faeroes: Cyclonic, mainly west, 5 to 7. Rough or very rough, occasionally moderate in Fair Isle and Faeroes. Squally wintry showers. Good occasionally poor. _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users