Re: Name resolution failure on a caching server -- many '; pending-answer' records in the cache

TPCbind Fri, 29 Jan 2016 20:42:24 -0800

Thanks for the followup.

> 
> NXDOMAIN is not a "failure" response. Are you *sure* you're getting NXDOMAIN?


Yes. Pretty sure. With hindsight I should have run the tests inside a 'script' 
session.

> If you're using nslookup to test, be aware that it will do suffix searching 
> by default, so if the original query, e.g. www.bbc.co.uk  fails, it'll 
> quietly (unless debug-mode is in effect) start appending suffixes. Looking up 
> those suffixed names, e.g. www.bbc.co.uk.example.com, mostly likely gets an 
> NXDOMAIN, so nslookup reports NXDOMAIN as the overall result of the query. 
> So, it's basically a misreporting of the error by nslookup. 

Yes. I was mostly using nslookup.  I'll try dig too next time this occurs.

> 
> Note that only 1 of the records in your cache dump is actually relevant -- 
> the CNAME from www.bbc.co.uk to www.bbc.net.uk -- and the others are for a 
> different part of the namespaces (thdow.bbc.co.uk).

I'll contact you privately with a link to the whole cache.  Every entry tagged 
'pending-*' in the cache which I tried querying failed to resolve when queried, 
many hours after the network congestion had ended.

> 
> If you do an explicit query of the CNAME, when the problem is occurring, does 
> it resolve? I would expect, even though the cache entry is marked 
> "pending-answer", it will still resolve. But, without the target of the CNAME 
> also resolving, the lookup as a whole cannot succeed.

I'll try that next time.

Regards
Tom.

> 
>                                                                               
>                         - Kevin
> 
> -----Original Message-----
> From: bind-users-boun...@lists.isc.org 
> [mailto:bind-users-boun...@lists.isc.org] On Behalf Of 
> tpcb...@mklab.ph.rhul.ac.uk
> Sent: Tuesday, January 26, 2016 8:02 PM
> To: bind-users@lists.isc.org
> Subject: Name resolution failure on a caching server -- many '; 
> pending-answer' records in the cache
> 
> Dear All,
>      I run a caching server on a section of the departmental LAN.
> Occasionally network congestion results in timeouts & name resolution 
> failures.  Lookups performed on name servers outside my LAN section fail with 
> NXDOMAIN.  Querying my name server for items not in its cache gets the same 
> result.
> 
> My problem is that long after the congestion has subsided, queries to my name 
> server still result in NXDOMAIN failure.  AFAICT this situation remains 
> indefinitely, until the cache is flushed 'rndc flush' or the bind restarted.  
> When it is in this state dumping the cache with 'rndc dumpdb' shows numerous 
> entries like this,
> 
> --------------------------------------------------------------------------------------------
> ; pending-additional
> thdow.bbc.co.uk.        76632   NS      ns3.bbc.net.uk.
>                         76632   NS      ns4.bbc.co.uk.
>                         76632   NS      ns4.bbc.net.uk.
>                         76632   NS      ns3.bbc.co.uk.
> ; pending-answer
> ns0.thdow.bbc.co.uk.    2082    \-AAAA  ;-$NXRRSET
> ; thdow.bbc.co.uk. SOA ns.bbc.co.uk. hostmaster.bbc.co.uk. 2015122100 1800 
> 600 864000 86400 ; pending-answer
>                         76632   A       212.58.240.162
> ; pending-answer
> www.bbc.co.uk.          30      CNAME   www.bbc.net.uk.
> ; glue
> --------------------------------------------------------------------------------------------
> 
> and attempts to lookup eg. www.bbc.co.uk result in NXDOMAIN.
> 
> Browsing the documentation I noticed the parameter 'max-ncache-ttl'
> which is unset in my named.conf and apparently defaults to 3hours.
> However the problem persists long after 3hours has elapsed following 
> incidents of network congestion.
> 
> I could setup a cronjob to check name resolution on external domains and 
> flush the cache when it fails?  I am assuming there must be better solution!  
> Should I set max-ncache-ttl to something fairly short in my named.conf and 
> hope that the default value is for some reason actually
> >> 3hours?
> 
> BTW I there a way to dump out all the parameters from a running named
> -- just to see all their values ?
> 
> 
> Any ideas on how to solve or further diagnose the problem?
> 
> Many thanks
> Tom Crane
> 
> System details:
> OS:    Scientific Linux CERN SLC release 6.7 (Carbon) [NB: SLC is a 
> derivative of RHEL]
> BIND:  bind-9.8.2-0.37.rc1.el6_7.5.x86_64
> 
> Ps. I originally posted in Usenet NG comp.protocols.dns.bind but got no 
> followups and then noticed all messages in that NG had this ML's fields 
> 'NNTP-Posting-Host: lists.isc.org' and 'X-Original-To: 
> bind-users@lists.isc.org' etc. in their headers.  Is c.p.d.b actually a 
> moderated group now or exclusively tied to this ML via a mail2news gateway?
> 
> -- 
> Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
> Egham, Surrey, TW20 0EX, England.
> Email:  T dot Crane at rhul dot ac dot uk
> 
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
> 


-- 
-- 
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England. 
Email:  t.cr...@rhul.ac.uk
Fax:    +44 (0) 1784 472794
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Name resolution failure on a caching server -- many '; pending-answer' records in the cache

Reply via email to