Hi, I wondered a while whether this would be more appropriate to post here or as an issue in ISC's gitlab, but came to the conclusion that for now the best place would be here. The reason is that the "how to reproduce the problem" bit is quite fuzzy.
If someone from ISC wants this reported as a gitlab issue as well, I can do that, of course. Context: we are running 4 nodes in an anycast setup, providing our users with DNS recursor service, and RPZ service to a subset of these users. We have been using BIND 9.20 for a while, and have followed the ISC upgrades shortly after they were published, so we were up until recently running 9.20.6 for this service. Recently we started receiving reports from some of our users that ... "DNS lookups are un-reliable". An example which I managed to catch / reproduce (based on a report for one of the other 3 nodes): $ dig @osl-res.uninett.no. freebsd.org. a ; <<>> DiG 9.14.7 <<>> @osl-res.uninett.no. freebsd.org. a ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 51745 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 16c89ea584a0a45c0100000067d2ad42211e91f71ee4fdcc (good) ;; QUESTION SECTION: ;freebsd.org. IN A ;; Query time: 27 msec ;; SERVER: 2001:700:0:102::ca53#53(2001:700:0:102::ca53) ;; WHEN: Thu Mar 13 11:02:42 CET 2025 ;; MSG SIZE rcvd: 68 $ dig @osl-res.uninett.no. freebsd.org. a ; <<>> DiG 9.14.7 <<>> @osl-res.uninett.no. freebsd.org. a ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2380 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 893098498db1b2330100000067d2ad4b2f511f5ac2cf4c48 (good) ;; QUESTION SECTION: ;freebsd.org. IN A ;; ANSWER SECTION: freebsd.org. 3600 IN A 96.47.72.84 ;; Query time: 30 msec ;; SERVER: 2001:700:0:102::ca53#53(2001:700:0:102::ca53) ;; WHEN: Thu Mar 13 11:02:51 CET 2025 ;; MSG SIZE rcvd: 84 $ The name server in question does not have any connectivity issues that I'm aware of, and ... it really doesn't make a whole lot of sense to me that it would at one instant reply with SERVFAIL only to seconds later respond with a DNSSEC-validated OK reply. I've unsuccessfuly looked in the logs for the SERVFAIL for this domain, but apparently our logging does not catch those. At the time when this was done, the name server had been running for weeks: osl-res: {1} ps axu | egrep 'PID|named' USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND named 6739 114 2.6 1363112 866384 ? Osl 27Feb25 14435:20.10 /usr/p osl-res: {2} This node serves in the order of peak around 3000 qps, and rarely if ever serves less than 700 qps during a 24-hour cycle. This makes it somewhere between difficult and impossible to provide a precise reproducer description which is obviously preferred for a proper bug report. It also has an instance of RFC 9462 applied, which is "discovery of designated resolvers", pointing clients to the DoT and DoH endpoints this instance serves by publishing _dns.resolver.arpa SVCB records in the DNS view for the clients. As a consequence, a fair number of queries (20%? 30%?) arrive over those transports. For now we have downgraded BIND to 9.18.34 on the two nodes where similar trouble has been reported, and we will in all probability do the same for the remaining two nodes in the cluster. ...which is a shame, really, but having to deal with this sort of issue popping up at unpredictable times, exposing our users to it is ... not exactly ideal. So... What I guess I'm doing with this message is ask if anyone else have been experiencing anything resembling this problem, or if anyone have any more clues to share to guide further debugging of this problem? FWIW, we're running BIND on NetBSD/amd64 10.0 on these nodes. Best regards, - Håvard -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users