Hi everyone,

I've been running bind (since bind 4) in my home/lab for around 3 decades. I went that route because I wanted: A) to be abstracted from the ISP's DNS servers and B) to have a local DNS cache.

Fast forward to today and I have several servers, each with a copy of my zones and the same config files, all are NS for my zones and there are floating IPs (controlled by some clustering software) which float across my servers and to which the DNS clients point to.

My DNS servers are 1) recursive, 2) authoritative for my zones and 3) provide extra stuff (like being able to point to my company's DNS servers when the VPN is up and reachable so my bind daemons can accumulate a cache of DNS entries).

I run 'bind9.16' on RHEL/Linux using the default bundled root zone (/var/named/named.ca) and the default root key (/etc/named.root.key).

This has worked well for years until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable..

# grep -c SERVFAIL auth_servers.log*
auth_servers.log:34245
auth_servers.log.0:236802
auth_servers.log.1:225409
auth_servers.log.2:226233
auth_servers.log.3:224762
auth_servers.log.4:242299
auth_servers.log.5:214953 < April 28th and beyond
auth_servers.log.6:1207
auth_servers.log.7:281

Starting April 28th, I started seeing tons of things like this in the auth log:
28-Apr-2025 00:13:03.714 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.44.8#53
28-Apr-2025 00:13:03.720 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.197.3#53
28-Apr-2025 00:13:03.725 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.45.8#53
28-Apr-2025 00:13:03.730 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.44.72#53
28-Apr-2025 00:13:03.735 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.198.171#53
28-Apr-2025 00:13:03.740 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.193.165#53
28-Apr-2025 00:13:03.745 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.194.8#53

I struggled for a couple days to bring my DNS servers back into service and the -ONLY- thing which worked was to declare some 'forwarders' (Google + CloudFlare). Nothing else brought reliable DNS service back.

Here is a list of what I tried and did -NOT- yield satisfactory results.

- RHEL only had the 20326 KSK in /etc/named.root.key so I updated it with the file from upstream :
(https://raw.githubusercontent.com/isc-projects/bind9/refs/heads/main/bind.keys)

- I tried to turn off dnssec completely but that barely made a difference:

        dnssec-enable no;
        dnssec-validation no;

- I tried to switch off IPV6 resolution:
        query-source-v6 port 0; // disables IPv6 queries
        prefer-ipv4 yes;

In the end, the -only- solution which brought back working DNS resolution was this:
        forwarders {
                1.0.0.1;
                1.1.1.1;
                8.8.8.8;
                8.8.4.4;
        };

I am not a DNS administrator and I have little clue as to if I am doing something slightly wrong or very wrong. Does anyone have any idea why all this starting happening to at the end of April? I reproduced the issue on multiple RHEL systems across the Internet (RHEL 9.4 and 9.5, in EMEA and Canada).

The symptom looked like this (with ftp.lip6.fr and lip6.fr as examples):
# dig -t dnskey .

; <<>> DiG 9.16.23-RH <<>> -t dnskey .
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52934
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d0e97efe7a229ba20100000068138cad7cee5edc6fe3e926 (good)
;; QUESTION SECTION:
;.                              IN      DNSKEY

;; ANSWER SECTION:
.                       172654  IN      DNSKEY  257 3 8 
AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3 
+/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kv 
ArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF 
0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+e 
oZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfd 
RUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwN R1AkUTV74bU=
.                       172654  IN      DNSKEY  256 3 8 
AwEAAbEbGCpGTDrcZTWqWWE72nphyshpRcILdzCVlBGU9Ln1Fui9kkse 
UOP+g5GLUeVFKdTloeRTA9+EYiQdXgWXmXmuW/nGxZjAikluF/O9NzLV 
rr5iZnth2xu+F48nrJlAgWWiMNau54NI5sZ3iVQfhFsq2pZmf43RauRP 
niYMShOLO7EBWWXr5glDSgZGS9fSm6xHwwF+g8D4m8oanjvdCBNxXzSE 
KS31ibxjLifTfvwCg3y4XXcNW9U6Nu3JmoKUdxqpPPIkBvVQbIz4UO2F 
waR13uXC03ALP1Yx2QNSS4SZlcIMtAftQR9wtCiuPWQnFv4jkzWqlhp1 Lmf7bcoL9yk=
.                       172654  IN      DNSKEY  257 3 8 
AwEAAa96jeuknZlaeSrvyAJj6ZHv28hhOKkx3rLGXVaC6rXTsDc449/c 
idltpkyGwCJNnOAlFNKF2jBosZBU5eeHspaQWOmOElZsjICMQMC3aeHb 
GiShvZsx4wMYSjH8e7Vrhbu6irwCzVBApESjbUdpWWmEnhathWu1jo+s 
iFUiRAAxm9qyJNg/wOZqqzL/dL/q8PkcRU5oUKEpUge71M3ej2/7CPqp 
dVwuMoTvoB+ZOT4YeGyxMvHmbrxlFzGOHOijtzN+u1TQNatX2XBuzZNQ 
1K+s2CXkPIZo7s6JgZyvaBevYtxPvYLw4z9mR7K2vaF18UYH9Z9GNUUe ayffKC73PYc=

;; Query time: 1 msec
;; SERVER: 10.0.128.192#53(10.0.128.192)
;; WHEN: Thu May 01 11:01:01 EDT 2025
;; MSG SIZE  rcvd: 883

For .fr, I got:
# dig -t dnskey fr.

; <<>> DiG 9.16.23-RH <<>> -t dnskey fr.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7803
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 908727a1a56ed3f10100000068138d16e3174ac40be54965 (good)
;; QUESTION SECTION:
;fr.                            IN      DNSKEY

;; ANSWER SECTION:
fr.                     3380    IN      DNSKEY  256 3 13 
2yYjxwbnBVaMy1ugHZj07qbYDIMSPZuC0Z9XrsDltV9YtNp8heCsFdGj 
X7HMBAWRzq7XTeiMedFJaR++9fIQ7w==
fr.                     3380    IN      DNSKEY  257 3 13 
qRwN6xREvGGSFLh9cSFollJA3PRjjG7zHkUxc+7GZUi5qwaYZxYlXIBm 
ajzGWjVpwhzlWdnwWYZxPsJ88HfqDA==

;; Query time: 0 msec
;; SERVER: 10.0.128.192#53(10.0.128.192)
;; WHEN: Thu May 01 11:02:46 EDT 2025
;; MSG SIZE  rcvd: 219

but for, lip6.fr, I got this:
[root@rh9x64 ~]# dig -t dnskey lip6.fr.

; <<>> DiG 9.16.23-RH <<>> -t dnskey lip6.fr.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 52808
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 72a2e3dcc58505fb0100000068138da9e546cc1a454889d5 (good)
;; QUESTION SECTION:
;lip6.fr.                       IN      DNSKEY

;; Query time: 19 msec
;; SERVER: 10.0.128.192#53(10.0.128.192)
;; WHEN: Thu May 01 11:05:13 EDT 2025
;; MSG SIZE  rcvd: 64

[root@rh9x64 ~]# dig -t NS lip6.fr.

; <<>> DiG 9.16.23-RH <<>> -t NS lip6.fr.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55374
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: a8d14f2b3f811f9e0100000068138db614ae02e51bc9295d (good)
;; QUESTION SECTION:
;lip6.fr.                       IN      NS

;; Query time: 15 msec
;; SERVER: 10.0.128.192#53(10.0.128.192)
;; WHEN: Thu May 01 11:05:26 EDT 2025
;; MSG SIZE  rcvd: 64

then in my logs, I see this:
/var/log/named/auth_servers.log:01-May-2025 11:05:13.842 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/DNSKEY/IN': 193.51.24.1#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.847 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/DNSKEY/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.852 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/DNSKEY/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.870 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 193.51.24.1#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.870 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 193.51.24.1#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.876 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.877 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.881 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:13.882 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.662 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/NS/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.666 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/NS/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.671 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'lip6.fr/NS/IN': 193.51.24.1#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.682 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.684 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 132.227.60.2#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.687 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.688 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 132.227.60.30#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.693 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'osiris.lip6.fr/AAAA/IN': 193.51.24.1#53
/var/log/named/auth_servers.log:01-May-2025 11:05:26.694 lame-servers: info: 
SERVFAIL unexpected RCODE resolving 'isis.lip6.fr/AAAA/IN': 193.51.24.1#53

As a result, anything under lip6.fr ends in SERVFAIL:

# host ftp.lip6.fr
Host ftp.lip6.fr not found: 2(SERVFAIL)

The only way to get back to a working state is to add back some forwarders.

Any ideas? Am I doing anything wrong? I'm attaching a sanitized copy of my named.conf in case someone could spot something:

===================================================================================
// ACLs removed

options {
        listen-on port 53 { 127.0.0.1; 10.0.0.0/8; 172.16.0.0/12; 
192.168.0.0/16; };
        listen-on-v6 port 53 { ::1; ff00::0; fe80::0; 2a01:e0a:bd6:aa50::/64; };

        directory               "/var/named";
        dump-file               "/var/named/data/cache_dump.db";
        statistics-file         "/var/named/data/named_stats.txt";
        memstatistics-file      "/var/named/data/named_mem_stats.txt";
        secroots-file           "/var/named/data/named.secroots";
        recursing-file          "/var/named/data/named.recursing";

        recursion yes;

        dnssec-validation auto;

        managed-keys-directory "/var/named/dynamic";
        geoip-directory "/usr/share/GeoIP";

        pid-file "/run/named/named.pid";
        session-keyfile "/run/named/session.key";

        /* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
        include "/etc/crypto-policies/back-ends/bind.config";

        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.root.key";

        // Re-enabled 2025/04/30
        forwarders {
                1.0.0.1;
                1.1.1.1;
                8.8.8.8;
                8.8.4.4;
        };

        masterfile-format text;
        notify no;
        max-cache-ttl 4233600;
        max-ncache-ttl 86400;
        max-cache-size 512m;

        // disable AAAA queries on ipv4
        filter-aaaa-on-v4 yes;

        /* How often to probe for new interfaces (minutes) */
        interface-interval 1;

        cleaning-interval 0 ;

        rrset-order { order random; };
        response-policy { zone "rpz"; };

        // Allow queries
        allow-query { "any"; };
        allow-query-cache { "any"; };
        allow-recursion { "any"; };
        allow-transfer {
                "bind_dns_servers";
        };
        allow-update {
                "bind_dns_servers";
        };
};

controls {
        inet 127.0.0.1 allow { localhost;
        "bind_dns_servers";
        } keys { rndckey; };
};

logging {
        channel default_log {
                file "/var/log/named/named.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel auth_servers_log {
                file "/var/log/named/auth_servers.log" versions 100 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel dnssec_log {
                file "/var/log/named/dnssec.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel zone_transfers_log {
                file "/var/log/named/zone_transfers.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel ddns_log {
                file "/var/log/named/ddns.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel client_security_log {
                file "/var/log/named/client_security.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel rate_limiting_log {
                file "/var/log/named/rate_limiting.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel rpz_log {
                file "/var/log/named/rpz.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        channel dnstap_log {
                file "/var/log/named/dnstap.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        //
        // If you have the category ‘queries’ defined, and you don’t want query 
logging
        // by default, make sure you add option ‘querylog no;’ - then you can 
toggle
        // query logging on (and off again) using command ‘rndc querylog’
        //
        channel queries_log {
                file "/var/log/named/queries.log" versions 64 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        //
        // This channel is dynamic so that when the debug level is increased 
using
        // rndc while the server is running, extra information will be logged 
about
        // failing queries.  Other debug information for other categories will 
be
        // sent to the channel default_debug (which is also dynamic), but 
without
        // affecting the regular logging.
        //
        channel query-errors_log {
                file "/var/log/named/query-errors.log" versions 5 size 32m;
                print-time yes; print-category yes; print-severity yes;
                severity dynamic;
        };
        //
        // This is the default syslog channel, defined here for clarity.  You 
don’t
        // have to use it if you prefer to log to your own channels.
        // It sends to syslog’s daemon facility, and sends only logged messages
        // of priority info and higher.
        // (The options to print time, category and severity are non-default.)
        //
        channel default_syslog {
                syslog daemon;
                print-time yes; print-category yes; print-severity yes;
                severity info;
        };
        //
        // This is the default debug output channel, defined here for clarity.  
You
        // might want to redefine the output destination if it doesn’t fit with 
your
        // local system administration plans for logging.  It is also a special
        // channel that only produces output if the debug level is non-zero.
        //
        channel default_debug {
                file "data/named.run";
                print-time yes; print-category yes; print-severity yes;
                severity dynamic;
        };
        //
        // Log routine stuff to syslog and default log:
        //
        category default { default_syslog; default_debug; default_log; };
        category config { default_syslog; default_debug; default_log; };
        category dispatch { default_syslog; default_debug; default_log; };
        category network { default_syslog; default_debug; default_log; };
        category general { default_syslog; default_debug; default_log; };
        //
        // Log messages relating to what we got back from authoritative servers 
during
        // recursion (if lame-servers and edns-disabled are obscuring other 
messages
        // they can be sent to their own channel or to null).  Sometimes these 
log
        // messages will be useful to research why some domains don’t resolve or
        // don’t resolve reliably
        //
        category resolver { auth_servers_log; default_debug; };
        category cname { auth_servers_log; default_debug; };
        category delegation-only { auth_servers_log; default_debug; };
        category lame-servers { auth_servers_log; default_debug; };
        category edns-disabled { auth_servers_log; default_debug; };
        //
        // Log problems with DNSSEC:
        //
        category dnssec { dnssec_log; default_debug; };
        //
        // Log together all messages relating to authoritative zone propagation
        //
        category notify { zone_transfers_log; default_debug; };
        category xfer-in { zone_transfers_log; default_debug; };
        category xfer-out { zone_transfers_log; default_debug; };
        //
        // Log together all messages relating to dynamic updates to DNS zone 
data:
        //
        category update{ ddns_log; default_debug; };
        category update-security { ddns_log; default_debug; };
        //
        // Log together all messages relating to client access and security.
        // (There is an additional category ‘unmatched’ that is by default sent 
to
        // null but which can be added here if you want more than the one-line
        // summary that is logged for failures to match a view).
        //
        category client{ client_security_log; default_debug; };
        category security { client_security_log; default_debug; };
        //
        // Log together all messages that are likely to be related to 
rate-limiting.
        // This includes RRL (Response Rate Limiting) - usually deployed on 
authoritative
        // servers and fetches-per-server|zone.  Note that it does not include
        // logging of changes for clients-per-query (which are logged in 
category
        // resolver).  Also note that there may on occasions be other log 
messages
        // emitted by the database category that don’t relate to rate-limiting
        // behaviour by named.
        //
        category rate-limit { rate_limiting_log; default_debug; };
        category spill { rate_limiting_log; default_debug; };
        category database { rate_limiting_log; default_debug; };
        //
        // Log DNS-RPZ (Response Policy Zone) messages (if you are not using 
DNS-RPZ
        // then you may want to comment out this category and associated 
channel)
        //
        category rpz { rpz_log; default_debug; };
        //
        // Log messages relating to the "dnstap" DNS traffic capture system  
(if you
        // are not using dnstap, then you may want to comment out this category 
and
        // associated channel).
        //
        category dnstap { dnstap_log; default_debug; };
        //
        // If you are running a server (for example one of the Internet root
        // nameservers) that is providing RFC 5011 trust anchor updates, then 
you
        // may be interested in logging trust anchor telemetry reports that your
        // server receives to analyze anchor propagation rates during a key 
rollover.
        // If this would be useful then firstly, configure the new channel, and 
then
        // un-comment and the line below to direct the category there instead 
of to
        // syslog and default log:
        //
        category trust-anchor-telemetry { default_syslog; default_debug; 
default_log; };
        //
        // If you have the category ‘queries’ defined, and you don’t want query 
logging
        // by default, make sure you add option ‘querylog no;’ - then you can 
toggle
        // query logging on (and off again) using command ‘rndc querylog’
        //
        category queries { queries_log; };
        //
        // This logging category will only emit messages at debug levels of 1 or
        // higher - it can be useful to troubleshoot problems where queries are
        // resulting in a SERVFAIL response.
        //
        category query-errors {query-errors_log; };
};

zone "." IN {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
include "/etc/named.AD.zones";
include "/etc/named.primary.zones";
include "/etc/named.forwardonly.zones";
include "/etc/rndc.key";
===================================================================================

Thank you for your attention,

,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,
Vincent S. Cojot, Computer Engineering. STEP project. _.,-*~'`^`'~*-,._.,-*~
Ecole Polytechnique de Montreal, Comite Micro-Informatique. _.,-*~'`^`'~*-,.
Linux Xview/OpenLook resources page _.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'
http://step.polymtl.ca/~coyote  _.,-*~'`^`'~*-,._ coy...@nospam4cojot.name

They cannot scare me with their empty spaces
Between stars - on stars where no human race is
I have it in me so much nearer home
To scare myself with my own desert places.       - Robert Frost

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to