Jay Levitt wrote:

A quick test shows that indeed, an awful lot of domains are repeatedly failing in lookup_ns, but that different domains fail at different times - the domains that repeatedly fail right now were fine last night in the SA logs.

So it looks like this is something (intermittment) to do with the resolver on my system, or perhaps the caching nameserver, and nothing to do with SA. I'll keep digging and report back what I find. If anyone has any tips, of course, feel free to let me know.

I spoke too soon. Turns out I'd accidentally left "recurse=>0" in the test harness. No wonder it was failing so often.


I discovered Net::DNS::Resolver::errorstring, and put some more logging into SA, and the problem is really simple: my caching-only nameserver times out when looking up NS records for a site that's not in the cache. Not entirely surprising, with a 3-second timeout in SA. And my site is infinitely small (just me), so it's going to be fairly common that one of the well-known sites is not in cache.

SA realizes this, and tries to loop, in Dns.pm's is_dns_available, but the loop is coded wrong, because either a success or a failure breaks out of the loop! A timeout in lookup_ns will result in $result defined, but containing no records, and that triggers the "failed horribly" clause, setting $IS_DNS_AVAILABLE to zero until mimedefang eventually cycles the child process.

I *think* the bug fix is just to remove that whole else clause from is_dns_available, but as a Perl novice I'd certainly like someone to double-check that.

And, you know, now that I look at it, it seems like is_dns_available uses lookup_ns to test general DNS availability, but lookup_ns has its own caching that would seem to defeat the point of the test if a site is ever hit twice!

Jay



Reply via email to