Hi, I have a fedora23 system with bind-9.10.3 that's been running fine for a long time. For some reason this morning, queries started timing out. This is a mail server, so queries to spamhaus, barracuda, etc, started timing out with:
Mar 23 14:46:57 mail03 postfix/postscreen[12635]: warning: dnsblog reply timeout 10s for mykey.zen.dq.spamhaus.net where 'mykey' is the key assigned to me for the service. (this isn't a "query volume reached" kind of error). It's almost like there's a firewall blocking outbound access, but that's not the case. Sometimes queries work, sometimes they timeout: # host google.com ;; connection timed out; no servers could be reached Trying the same command again, and it might work. Here's an example with messagelabs: # host messagelabs.com ;; connection timed out; no servers could be reached # host messagelabs.com messagelabs.com has address 216.12.145.20 messagelabs.com has address 155.64.49.54 ;; connection timed out; no servers could be reached # host messagelabs.com 8.8.4.4 Using domain server: Name: 8.8.4.4 Address: 8.8.4.4#53 Aliases: messagelabs.com has address 216.12.145.20 messagelabs.com has address 155.64.49.54 messagelabs.com mail is handled by 10 cluster6.eu.messagelabs.com. messagelabs.com mail is handled by 20 cluster6a.eu.messagelabs.com. It does appear to work reliably when using google's nameservers. Just running "dig" returns all the forward entries for the top-level servers, but not the reverse. My hints file does have both, however. Then I noticed these in the bind logs: 23-Mar-2016 15:12:10.603 general: info: zone sbl.example.com/IN: refresh: retry limit for master 64.11.16.5#53 exceeded (source 0.0.0.0#0) 23-Mar-2016 15:12:10.603 general: info: zone sbl.example.com/IN: Transfer started. 23-Mar-2016 15:12:10.615 xfer-in: info: transfer of 'sbl.example.com/IN' from 64.11.16.5#53: connected using 68.193.193.45#39699 23-Mar-2016 15:12:10.627 xfer-in: info: transfer of 'sbl.example.com/IN' from 64.11.16.5#53: Transfer status: up to date 23-Mar-2016 15:12:10.627 xfer-in: info: transfer of 'sbl.example.com/IN' from 64.11.16.5#53: Transfer completed: 0 messages, 1 records, 0 bytes, 0.012 secs (0 bytes/sec) where 'example.com' is my domain. A little googling shows this is the result of the UDP transfer failing, then falling back to TCP. This system is running on a Cablevision/Optonline business-class cable connection. They've said the circuit is operating normally. Could this still be some kind of network issue? There are no local errors on the interface, and I've rebooted their modem and even replaced the network cable. Perhaps you know of a tcpdump option where I can look for network retries or some type of packet retransmission/errors? I'm really stuck, and the mail server isn't functioning while I figure this out, so any help greatly appreciated. I've included my named.conf but it was working fine yesterday: acl "trusted" { { 127/8; }; { 192.168.1.0/24; }; { 23.224.183.206; }; { 68.193.193.45; }; }; options { listen-on port 53 { 127.0.0.1; 68.193.193.45; }; // listen-on-v6 port 53 { ::1; }; listen-on-v6 port 53 { none; }; directory "/var/named"; dump-file "/var/named/data/cache_dump.db"; statistics-file "/var/named/data/named.stats"; // _PATH_STATS memstatistics-file "/var/named/data/named.memstats"; // _PATH_MEMSTATS allow-query { trusted; }; notify master-only; recursive-clients 5000; /* - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion. - If you are building a RECURSIVE (caching) DNS server, you need to enable recursion. - If your recursive DNS server has a public IP address, you MUST enable access control to limit queries to your legitimate users. Failing to do so will cause your server to become part of large scale DNS amplification attacks. Implementing BCP38 within your network would greatly reduce such attack surface */ // recursion yes; allow-recursion { trusted; }; dnssec-enable yes; dnssec-validation yes; dnssec-lookaside auto; /* Path to ISC DLV key */ bindkeys-file "/etc/named.iscdlv.key"; managed-keys-directory "/var/named/dynamic"; pid-file "/run/named/named.pid"; session-keyfile "/run/named/session.key"; }; logging { channel default_debug { file "data/named.run"; severity dynamic; }; // Record all queries to the box for now channel query_info { severity info; file "/var/log/named.query.log" versions 3 size 10m; print-time yes; print-category yes; }; // added for fail2ban support channel security_file { severity dynamic; file "/var/log/named.security.log" versions 3 size 30m; print-time yes; print-category yes; }; channel b_debug { file "/var/log/named.debug.log" versions 2 size 10m; print-time yes; print-category yes; print-severity yes; severity dynamic; }; // Send the security related messages to a separate file. channel audit_log { file "/var/log/named.audit.log" versions 4 size 10m; severity info; print-time yes; print-category yes; }; category queries { query_info; }; category default { b_debug; }; category config { b_debug; }; category security { security_file; }; category lame-servers { null; }; }; zone "." IN { type hint; file "/var/named/named.ca"; }; zone "sbl.example.com" { type slave; file "slaves/db.sbl.example.com"; masters { 64.11.16.5; }; allow-query { trusted; }; allow-transfer { trusted; }; }; include "/etc/named.rfc1912.zones"; include "/etc/named.root.key"; include "/etc/rndc.key"; Thanks, Alex _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users