Re: Massive increase of SERVFAIL after April 28th 2025.
Also don’t use +short if you want to see the NSID. From my corner of the internet I get the following. % dig +nsid version.bind. txt ch @dns4.p08.nsone.net ; <<>> DiG 9.21.3-dev <<>> +nsid version.bind. txt ch @dns4.p08.nsone.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21568 ;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; NSID: 6e 73 31 64 6e 73 2d 73 79 64 30 34 2d 31 31 31 33 31 2d 35 33 31 38 ("ns1dns-syd04-11131-5318") ;; QUESTION SECTION: ;version.bind. CH TXT ;; ANSWER SECTION: version.bind. 0 CH TXT "366568643ba5103a1f441fbc3c502ed2eaa0b3d9" ;; Query time: 12 msec ;; SERVER: 198.51.45.72#53(dns4.p08.nsone.net) (UDP) ;; WHEN: Fri May 02 12:10:10 AEST 2025 ;; MSG SIZE rcvd: 121 % > On 2 May 2025, at 04:44, Ondřej Surý wrote: > >> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net > > This needs to be this: ^^^ > > You missed @ and thus you asked your local resolver. > > Ondrej > -- > Ondřej Surý — ISC (He/Him) > > My working hours and your working hours may be different. Please do not feel > obligated to reply outside your normal working hours. > >> On 1. 5. 2025, at 20:21, Michael Richardson wrote: >> >> dig +short +nsid version.bind. txt ch dns4.p08.nsone.net >> >> I get: >> "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian" > > -- > Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from > this list > > ISC funds the development of this software with paid support subscriptions. > Contact us at https://www.isc.org/contact/ for more information. > > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi Vincent using a conventional resolver (no rpz, no forwards, no forward zones) from our Miami cloud: Tracing to ftp.lip6.fr[a] via 190.185.104.10, maximum of 3 retries 190.185.104.10 (190.185.104.10) |\___ g.ext.nic.fr [fr] (2001:0678:004c:::::0001) | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) Got authoritative answer [received type is cname] | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) Got authoritative answer [received type is cname] | \___ osiris.lip6.fr [lip6.fr] (132.227.60.30) Got authoritative answer [received type is cname] |\___ g.ext.nic.fr [fr] (194.0.36.1) | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached) | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached) | \___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached) |\___ d.nic.fr [fr] (2001:0678:000c:::::0001) | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached) | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached) | \___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached) |\___ d.nic.fr [fr] (194.0.9.1) | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached) | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached) | \___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached) |\___ f.ext.nic.fr [fr] (2001:067c:1010:0011::::0053) | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached) | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached) | \___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached) \___ f.ext.nic.fr [fr] (194.146.106.46) |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached) |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached) \___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached) osiris.lip6.fr (132.227.60.30) ftp.lip6.fr -> nephtys.lip6.fr osiris.lip6.fr (132.227.60.30) nephtys.lip6.fr -> 132.227.74.17 isis.lip6.fr (132.227.60.2) ftp.lip6.fr -> nephtys.lip6.fr isis.lip6.fr (132.227.60.2) nephtys.lip6.fr -> 132.227.74.17 soleil.uvsq.fr (193.51.24.1) ftp.lip6.fr -> nephtys.lip6.fr soleil.uvsq.fr (193.51.24.1) nephtys.lip6.fr -> 132.227.74.17 HTH Carlos Horowicz Planisys On 01/05/2025 18:07, vinc...@cojot.name wrote: Hi Carlos, First of all, I'd like to say how sorry I was for those affected, as I was watching the events unfold down south. I've rebuilt dnstracer for RHEL9 and I don't really understand what's going on here.. Here's the output for ftp.lip6.fr: # dnstracer -q cname -s M.GTLD-SERVERS.NET ftp.lip6.fr Tracing to ftp.lip6.fr[cname] via M.GTLD-SERVERS.NET, maximum of 3 retries M.GTLD-SERVERS.NET (2001:0501:b1f9:::::0030) Refers backwards Same output from any of my bind hosts: # dnstracer -q cname -s 127.0.01 ftp.lip6.fr Tracing to ftp.lip6.fr[cname] via 127.0.01, maximum of 3 retries 127.0.01 (127.0.0.1) Refers backwards But interestingly, doing this with www.google.com instead of ftp.lip6.fr -only- works on the bind servers with forwarders configured. On a test bind host without the forwarders, I get this: # dnstracer -q cname -s 127.0.01 www.google.com Tracing to www.google.com[cname] via 127.0.01, maximum of 3 retries 127.0.01 (127.0.0.1) Refers backwards Vincent On Thu, 1 May 2025, Carlos Horowicz via bind-users wrote: Hi, For SERVFAIL to happen, ALL authoritative for the affected domains must have been in Datacenters in Spain, Portugal or southern France. I live in Spain, and as 12:33 CET I lost not only power but basic telephony, cellular telephony and cellular data. Everything. Power generators were only good for keeping power locally at Datacenters or Hospitals, but they were isolated from each other. The mitigation began at around 2-3pm CET , as they were turning up different power plants one at a time and connecting it to the power network, and it took them more than 12 hours to turn everything up. So may be that was the reason, if it coincides with your perception ... dnstracer has eventually helped me find lame delegations. Carlos Horowicz Planisys On 01/05/2025 17:23, Rob McEwen via bind-users wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC f
Re: Massive increase of SERVFAIL after April 28th 2025.
vinc...@cojot.name wrote: > I've been running bind (since bind 4) in my home/lab for around 3 decades. I > went that route because I wanted: A) to be abstracted from the ISP's DNS > servers and B) to have a local DNS cache. Ditto. > I run 'bind9.16' on RHEL/Linux using the default bundled root zone > (/var/named/named.ca) and the default root key (/etc/named.root.key). And you have DNSSEC validation turned on? > This has worked well for years until a few days ago (April 28th?) when the > amount of SERVFAIL started going ballistic and started preventing the > resolution of a lot of DNS names on the internet to the point where DNS was > unusable.. > Starting April 28th, I started seeing tons of things like this in the auth > log: > 28-Apr-2025 00:13:03.714 lame-servers: info: SERVFAIL unexpected RCODE > resolving 'github.com/A/IN': 198.51.44.8#53 > 28-Apr-2025 00:13:03.720 lame-servers: info: SERVFAIL unexpected RCODE > resolving 'github.com/A/IN': 205.251.197.3#53 > 28-Apr-2025 00:13:03.725 lame-servers: info: SERVFAIL unexpected RCODE > resolving 'github.com/A/IN': 198.51.45.8#53 All of these look like valid DNS servers for github. Of course, AWS/GITHUB can't be bothered to do DNSSEC. From my vantage point, they all seem to resolve github.com My guess is that something is in the way, and it's probably trying to attack you (or your ISP) with fake replies, but it's doing a bad job. When I do: dig +short +nsid version.bind. txt ch dns4.p08.nsone.net I get: "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian" If you get something different, then that would be consistent with something else intercepting your traffic. > I struggled for a couple days to bring my DNS servers back into service and > the -ONLY- thing which worked was to declare some 'forwarders' (Google + > CloudFlare). Nothing else brought reliable DNS service back. :-( But that does suggest that something else is in the way. Did you forward with Do53, or did you use DoT/DoH? {No idea if bind can forward over DoH, I never tried} > - I tried to turn off dnssec completely but that barely made a difference: > dnssec-enable no; > dnssec-validation no; Won't matter, since github doesn't do DNSSEC, so the NXDOMAINs can't be validated (or rejected as invalid) > The only way to get back to a working state is to add back some forwarders. > Any ideas? Am I doing anything wrong? I'm attaching a sanitized copy of my > named.conf in case someone could spot something: I think you did everything right. I think talking to your upstream ISP is in order. > Thank you for your attention, > ,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-, > Vincent S. Cojot, Computer Engineering. STEP project. _.,-*~'`^`'~*-,._.,-*~ > Ecole Polytechnique de Montreal, Comite Micro-Informatique. _.,-*~'`^`'~*-,. Bonjour! Elbows Up. -- Michael Richardson. o O ( IPv6 IøT consulting ) Sandelman Software Works Inc, Ottawa and Worldwide signature.asc Description: PGP signature -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Rob McEwen via bind-users wrote: > I strongly suspect that this was caused (even if indirectly?) by the MASSIVE > and many-hours-long power outages in Europe, mainly in Spain and > Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern > Time (ET) - and the majority of it lasted almot 24 hours. I can't see how this would affect the massive anycast service that github gets from awsdns and nsone. Vincent is, I think, in Montreal. signature.asc Description: PGP signature -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Massive increase of SERVFAIL after April 28th 2025.
Hi everyone, I've been running bind (since bind 4) in my home/lab for around 3 decades. I went that route because I wanted: A) to be abstracted from the ISP's DNS servers and B) to have a local DNS cache. Fast forward to today and I have several servers, each with a copy of my zones and the same config files, all are NS for my zones and there are floating IPs (controlled by some clustering software) which float across my servers and to which the DNS clients point to. My DNS servers are 1) recursive, 2) authoritative for my zones and 3) provide extra stuff (like being able to point to my company's DNS servers when the VPN is up and reachable so my bind daemons can accumulate a cache of DNS entries). I run 'bind9.16' on RHEL/Linux using the default bundled root zone (/var/named/named.ca) and the default root key (/etc/named.root.key). This has worked well for years until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable.. # grep -c SERVFAIL auth_servers.log* auth_servers.log:34245 auth_servers.log.0:236802 auth_servers.log.1:225409 auth_servers.log.2:226233 auth_servers.log.3:224762 auth_servers.log.4:242299 auth_servers.log.5:214953 < April 28th and beyond auth_servers.log.6:1207 auth_servers.log.7:281 Starting April 28th, I started seeing tons of things like this in the auth log: 28-Apr-2025 00:13:03.714 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 198.51.44.8#53 28-Apr-2025 00:13:03.720 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 205.251.197.3#53 28-Apr-2025 00:13:03.725 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 198.51.45.8#53 28-Apr-2025 00:13:03.730 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 198.51.44.72#53 28-Apr-2025 00:13:03.735 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 205.251.198.171#53 28-Apr-2025 00:13:03.740 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 205.251.193.165#53 28-Apr-2025 00:13:03.745 lame-servers: info: SERVFAIL unexpected RCODE resolving 'github.com/A/IN': 205.251.194.8#53 I struggled for a couple days to bring my DNS servers back into service and the -ONLY- thing which worked was to declare some 'forwarders' (Google + CloudFlare). Nothing else brought reliable DNS service back. Here is a list of what I tried and did -NOT- yield satisfactory results. - RHEL only had the 20326 KSK in /etc/named.root.key so I updated it with the file from upstream : (https://raw.githubusercontent.com/isc-projects/bind9/refs/heads/main/bind.keys) - I tried to turn off dnssec completely but that barely made a difference: dnssec-enable no; dnssec-validation no; - I tried to switch off IPV6 resolution: query-source-v6 port 0; // disables IPv6 queries prefer-ipv4 yes; In the end, the -only- solution which brought back working DNS resolution was this: forwarders { 1.0.0.1; 1.1.1.1; 8.8.8.8; 8.8.4.4; }; I am not a DNS administrator and I have little clue as to if I am doing something slightly wrong or very wrong. Does anyone have any idea why all this starting happening to at the end of April? I reproduced the issue on multiple RHEL systems across the Internet (RHEL 9.4 and 9.5, in EMEA and Canada). The symptom looked like this (with ftp.lip6.fr and lip6.fr as examples): # dig -t dnskey . ; <<>> DiG 9.16.23-RH <<>> -t dnskey . ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52934 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: d0e97efe7a229ba2010068138cad7cee5edc6fe3e926 (good) ;; QUESTION SECTION: ;. IN DNSKEY ;; ANSWER SECTION: . 172654 IN DNSKEY 257 3 8 AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3 +/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kv ArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF 0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+e oZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfd RUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwN R1AkUTV74bU= . 172654 IN DNSKEY 256 3 8 AwEAAbEbGCpGTDrcZTWqWWE72nphyshpRcILdzCVlBGU9Ln1Fui9kkse UOP+g5GLUeVFKdTloeRTA9+EYiQdXgWXmXmuW/nGxZjAikluF/O9NzLV rr5iZnth2xu+F48nrJlAgWWiMNau54NI5sZ3iVQfhFsq2pZmf43RauRP niYMShOLO7EBWWXr5glDSgZGS9fSm6xHwwF+g8D4m8oanjvdCBNxXzSE KS31ibxjLifTfvwCg3y4XXcNW9U6Nu3JmoKUdxqpPPIkBvVQbIz4UO2F waR13uXC03ALP1Yx2QNSS4SZlcIMtAftQR9wtCiuPWQnFv4jkzWqlhp1 Lmf7bcoL9yk= . 172654 IN DNSKEY 257 3 8 AwEAAa96jeuknZlaeSrvyAJj6ZHv28hhOKkx3rLG
Re: Massive increase of SERVFAIL after April 28th 2025.
From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi Rob, Unfortunately, as soon as I remove the 'forwarders' in any of my named servers, the problem comes back. The output in my previous message was captured just a few minutes ago after I had disabled 'forwaders' in one of my bind servers. Regards, Vincent On Thu, 1 May 2025, Rob McEwen wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
In that case, someone smarter and more knowledgeable on this list will hopefully help you. But first - one last suggestion - if you find that forwards to 3rd party servers work - but turning those off causes issues - you should probably make sure that your "root hints" are updated, and purge any caching (rndc flush), then restart BIND. Maybe you've already done that? But if not, it's worth a try before digging deeper. If that doesn't fix this, then hopefully someone else on this list can help you. Rob McEwen, invaluement -- Original Message -- From vinc...@cojot.name To "Rob McEwen" Cc bind-users@lists.isc.org Date 5/1/2025 11:28:23 AM Subject Re: Massive increase of SERVFAIL after April 28th 2025. Hi Rob, Unfortunately, as soon as I remove the 'forwarders' in any of my named servers, the problem comes back. The output in my previous message was captured just a few minutes ago after I had disabled 'forwaders' in one of my bind servers. Regards, Vincent On Thu, 1 May 2025, Rob McEwen wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net This needs to be this: ^^^ You missed @ and thus you asked your local resolver. Ondrej -- Ondřej Surý — ISC (He/Him) My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours. > On 1. 5. 2025, at 20:21, Michael Richardson wrote: > > dig +short +nsid version.bind. txt ch dns4.p08.nsone.net > > I get: > "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian" -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi Michael, Thank you so much for chiming in! My guess is that something is in the way, and it's probably trying to attack you (or your ISP) with fake replies, but it's doing a bad job. When I do: dig +short +nsid version.bind. txt ch dns4.p08.nsone.net I get: "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian" Spot on! Here's what I get: # dig +short +nsid version.bind. txt ch dns4.p08.nsone.net "9.16.23-RH" 198.51.45.72 Free.fr is my ISP but "9.16.23-RH" suspiciously looks like the bind version I'm running on RHEL9: # rndc status|grep version version: BIND 9.16.23-RH (Extended Support Version) If you get something different, then that would be consistent with something else intercepting your traffic. Could my DNS servers be doing this to themselves? :-( But that does suggest that something else is in the way. Did you forward with Do53, or did you use DoT/DoH? {No idea if bind can forward over DoH, I never tried} > - I tried to turn off dnssec completely but that barely made a difference: > dnssec-enable no; > dnssec-validation no; Won't matter, since github doesn't do DNSSEC, so the NXDOMAINs can't be validated (or rejected as invalid) > The only way to get back to a working state is to add back some forwarders. > Any ideas? Am I doing anything wrong? I'm attaching a sanitized copy of my > named.conf in case someone could spot something: I think you did everything right. I think talking to your upstream ISP is in order. Thank you! Vincent -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi Carlos, First of all, I'd like to say how sorry I was for those affected, as I was watching the events unfold down south. I've rebuilt dnstracer for RHEL9 and I don't really understand what's going on here.. Here's the output for ftp.lip6.fr: # dnstracer -q cname -s M.GTLD-SERVERS.NET ftp.lip6.fr Tracing to ftp.lip6.fr[cname] via M.GTLD-SERVERS.NET, maximum of 3 retries M.GTLD-SERVERS.NET (2001:0501:b1f9:::::0030) Refers backwards Same output from any of my bind hosts: # dnstracer -q cname -s 127.0.01 ftp.lip6.fr Tracing to ftp.lip6.fr[cname] via 127.0.01, maximum of 3 retries 127.0.01 (127.0.0.1) Refers backwards But interestingly, doing this with www.google.com instead of ftp.lip6.fr -only- works on the bind servers with forwarders configured. On a test bind host without the forwarders, I get this: # dnstracer -q cname -s 127.0.01 www.google.com Tracing to www.google.com[cname] via 127.0.01, maximum of 3 retries 127.0.01 (127.0.0.1) Refers backwards Vincent On Thu, 1 May 2025, Carlos Horowicz via bind-users wrote: Hi, For SERVFAIL to happen, ALL authoritative for the affected domains must have been in Datacenters in Spain, Portugal or southern France. I live in Spain, and as 12:33 CET I lost not only power but basic telephony, cellular telephony and cellular data. Everything. Power generators were only good for keeping power locally at Datacenters or Hospitals, but they were isolated from each other. The mitigation began at around 2-3pm CET , as they were turning up different power plants one at a time and connecting it to the power network, and it took them more than 12 hours to turn everything up. So may be that was the reason, if it coincides with your perception ... dnstracer has eventually helped me find lame delegations. Carlos Horowicz Planisys On 01/05/2025 17:23, Rob McEwen via bind-users wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
> /var/log/named/auth_servers.log:01-May-2025 11:05:26.694 lame-servers: info: > SERVFAIL unexpected RCODE resolving 'isis.lip6.fr//IN': 193.51.24.1#53 do some queries for these many examples, like dig @193.51.24.1 isis.lip6.fr dig @132.227.60.2 osiris.lip6.fr dig +norec @198.51.44.72 github.com from the same system you are running named -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi, For SERVFAIL to happen, ALL authoritative for the affected domains must have been in Datacenters in Spain, Portugal or southern France. I live in Spain, and as 12:33 CET I lost not only power but basic telephony, cellular telephony and cellular data. Everything. Power generators were only good for keeping power locally at Datacenters or Hospitals, but they were isolated from each other. The mitigation began at around 2-3pm CET , as they were turning up different power plants one at a time and connecting it to the power network, and it took them more than 12 hours to turn everything up. So may be that was the reason, if it coincides with your perception ... dnstracer has eventually helped me find lame delegations. Carlos Horowicz Planisys On 01/05/2025 17:23, Rob McEwen via bind-users wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi Rob, Thank you for your message. Yes, I've already done all that. (got the latest root zone, restart named each time I switch from forwarders to non-forwarders, etc...). I am using lip6.fr as an example because it hosts some mirrors for Fedora Linux but I am pretty sure it's not the only site. On the other hand, I think you might be right.. on a RHEL9 host in Canada, even with the same configuration as here in EMEA, I don't reproduce the issue anymore: # dig -t dnskey lip6.fr. ; <<>> DiG 9.16.23-RH <<>> -t dnskey lip6.fr. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31209 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;lip6.fr. IN DNSKEY ;; AUTHORITY SECTION: lip6.fr.3600IN SOA osiris.lip6.fr. hostmaster.lip6.fr. 2025042900 21600 3600 360 3600 ;; Query time: 134 msec ;; SERVER: 213.186.33.99#53(213.186.33.99) ;; WHEN: Thu May 01 15:44:14 UTC 2025 ;; MSG SIZE rcvd: 90 ,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-, Vincent S. Cojot, Computer Engineering. STEP project. _.,-*~'`^`'~*-,._.,-*~ Ecole Polytechnique de Montreal, Comite Micro-Informatique. _.,-*~'`^`'~*-,. Linux Xview/OpenLook resources page _.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~' http://step.polymtl.ca/~coyote _.,-*~'`^`'~*-,._ coy...@nospam4cojot.name They cannot scare me with their empty spaces Between stars - on stars where no human race is I have it in me so much nearer home To scare myself with my own desert places. - Robert Frost On Thu, 1 May 2025, Rob McEwen wrote: In that case, someone smarter and more knowledgeable on this list will hopefully help you. But first - one last suggestion - if you find that forwards to 3rd party servers work - but turning those off causes issues - you should probably make sure that your "root hints" are updated, and purge any caching (rndc flush), then restart BIND. Maybe you've already done that? But if not, it's worth a try before digging deeper. If that doesn't fix this, then hopefully someone else on this list can help you. Rob McEwen, invaluement -- Original Message -- From vinc...@cojot.name To "Rob McEwen" Cc bind-users@lists.isc.org Date 5/1/2025 11:28:23 AM Subject Re: Massive increase of SERVFAIL after April 28th 2025. Hi Rob, Unfortunately, as soon as I remove the 'forwarders' in any of my named servers, the problem comes back. The output in my previous message was captured just a few minutes ago after I had disabled 'forwaders' in one of my bind servers. Regards, Vincent On Thu, 1 May 2025, Rob McEwen wrote: From vinc...@cojot.name until a few days ago (April 28th?) when the amount of SERVFAIL started going ballistic and started preventing the resolution of a lot of DNS names on the internet to the point where DNS was unusable I strongly suspect that this was caused (even if indirectly?) by the MASSIVE and many-hours-long power outages in Europe, mainly in Spain and Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours. https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal Hopefully, you're not seeing any more of these errors now? Rob McEwen, invaluement -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Hi again Carlos, I really don't understand how it works for you and not for me on a RHEL host in Canada. Here's what I was trying with 8.8.8.8 and 1.1.1.1: # dnstracer -o -e -s 8.8.8.8 ftp.lip6.fr Tracing to ftp.lip6.fr[a] via 8.8.8.8, maximum of 3 retries 8.8.8.8 (8.8.8.8) # dnstracer -o -e -s 1.0.0.1 ftp.lip6.fr Tracing to ftp.lip6.fr[a] via 1.0.0.1, maximum of 3 retries 1.0.0.1 (1.0.0.1) # dnstracer -o -s 1.1.1.1 ftp.lip6.fr Tracing to ftp.lip6.fr[a] via 1.1.1.1, maximum of 3 retries 1.1.1.1 (1.1.1.1) Vincent -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
On Thu, 1 May 2025, Michael Richardson wrote: Rob McEwen via bind-users wrote: > I strongly suspect that this was caused (even if indirectly?) by the MASSIVE > and many-hours-long power outages in Europe, mainly in Spain and > Portugal. That started on April 28, 2025, at approximately 6:33 a.m. Eastern > Time (ET) - and the majority of it lasted almot 24 hours. I can't see how this would affect the massive anycast service that github gets from awsdns and nsone. Vincent is, I think, in Montreal. I -was- (until the summer of 2024 and I'm now in France) :) Regards, -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Thank you, Ondřej! I'm getting the same answer from all my hosts: # dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net "366568643ba5103a1f441fbc3c502ed2eaa0b3d9" Vincent On Thu, 1 May 2025, Ondřej Surý wrote: dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net This needs to be this: ^^^ You missed @ and thus you asked your local resolver. Ondrej -- Ondřej Surý — ISC (He/Him) My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours. On 1. 5. 2025, at 20:21, Michael Richardson wrote: dig +short +nsid version.bind. txt ch dns4.p08.nsone.net I get: "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian" -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
Ondřej Surý wrote: >> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net > This needs to be this: ^^^ p> You missed @ and thus you asked your local resolver. Yes, you are right. Bad on me I actually have a script that does this, but I transcribed it for posting. I get: obiwan-[~](2.6.6) mcr 10194 % dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net "366568643ba5103a1f441fbc3c502ed2eaa0b3d9" signature.asc Description: PGP signature -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Massive increase of SERVFAIL after April 28th 2025.
I don’t think there was anything wrong with your servers. The log messages indicate problems with the authoritative servers. In theory authoritative DNS servers should be able to serve content until the zone content expires. They should be able to be powered off then rebooted and continue to serve the content they had provided it hasn’t expired in the meantime by reading it off local disc drives. This obviously has not happened for the zones you mentioned. The servers appear to have come up without access to the zone content they are supposed to serve and are hence returning SERVFAIL. Forwarding to the servers you are is providing indirect access to instances with zone content to serve. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users