Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Mark Andrews
Also don’t use +short if you want to see the NSID.

From my corner of the internet I get the following.

% dig +nsid version.bind. txt ch @dns4.p08.nsone.net

; <<>> DiG 9.21.3-dev <<>> +nsid version.bind. txt ch @dns4.p08.nsone.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21568
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; NSID: 6e 73 31 64 6e 73 2d 73 79 64 30 34 2d 31 31 31 33 31 2d 35 33 31 38 
("ns1dns-syd04-11131-5318")
;; QUESTION SECTION:
;version.bind. CH TXT

;; ANSWER SECTION:
version.bind. 0 CH TXT "366568643ba5103a1f441fbc3c502ed2eaa0b3d9"

;; Query time: 12 msec
;; SERVER: 198.51.45.72#53(dns4.p08.nsone.net) (UDP)
;; WHEN: Fri May 02 12:10:10 AEST 2025
;; MSG SIZE  rcvd: 121

% 


> On 2 May 2025, at 04:44, Ondřej Surý  wrote:
> 
>> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net
> 
> This needs to be this: ^^^
> 
> You missed @ and thus you asked your local resolver.
> 
> Ondrej
> --
> Ondřej Surý — ISC (He/Him)
> 
> My working hours and your working hours may be different. Please do not feel 
> obligated to reply outside your normal working hours.
> 
>> On 1. 5. 2025, at 20:21, Michael Richardson  wrote:
>> 
>> dig +short +nsid version.bind. txt ch dns4.p08.nsone.net
>> 
>> I get:
>> "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian"
> 
> -- 
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
> this list
> 
> ISC funds the development of this software with paid support subscriptions. 
> Contact us at https://www.isc.org/contact/ for more information.
> 
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Carlos Horowicz via bind-users

Hi Vincent

using a conventional resolver (no rpz, no forwards, no forward zones) 
from our Miami cloud:



Tracing to ftp.lip6.fr[a] via 190.185.104.10, maximum of 3 retries
190.185.104.10 (190.185.104.10)
 |\___ g.ext.nic.fr [fr] (2001:0678:004c:::::0001)
 | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) Got authoritative 
answer [received type is cname]
 | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) Got authoritative 
answer [received type is cname]
 |  \___ osiris.lip6.fr [lip6.fr] (132.227.60.30) Got authoritative 
answer [received type is cname]

 |\___ g.ext.nic.fr [fr] (194.0.36.1)
 | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached)
 | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached)
 |  \___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached)
 |\___ d.nic.fr [fr] (2001:0678:000c:::::0001)
 | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached)
 | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached)
 |  \___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached)
 |\___ d.nic.fr [fr] (194.0.9.1)
 | |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached)
 | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached)
 |  \___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached)
 |\___ f.ext.nic.fr [fr] (2001:067c:1010:0011::::0053)
 | |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached)
 | |\___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached)
 |  \___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached)
  \___ f.ext.nic.fr [fr] (194.146.106.46)
   |\___ osiris.lip6.fr [lip6.fr] (132.227.60.30) (cached)
   |\___ isis.lip6.fr [lip6.fr] (132.227.60.2) (cached)
    \___ soleil.uvsq.fr [lip6.fr] (193.51.24.1) (cached)

osiris.lip6.fr (132.227.60.30)  ftp.lip6.fr -> nephtys.lip6.fr
osiris.lip6.fr (132.227.60.30)  nephtys.lip6.fr -> 132.227.74.17
isis.lip6.fr (132.227.60.2) ftp.lip6.fr -> nephtys.lip6.fr
isis.lip6.fr (132.227.60.2) nephtys.lip6.fr -> 132.227.74.17
soleil.uvsq.fr (193.51.24.1)    ftp.lip6.fr -> nephtys.lip6.fr
soleil.uvsq.fr (193.51.24.1)    nephtys.lip6.fr -> 132.227.74.17

HTH

Carlos Horowicz
Planisys


On 01/05/2025 18:07, vinc...@cojot.name wrote:


Hi Carlos,

First of all, I'd like to say how sorry I was for those affected, as I 
was watching the events unfold down south.


I've rebuilt dnstracer for RHEL9 and I don't really understand what's 
going on here.. Here's the output for ftp.lip6.fr:


# dnstracer -q cname -s M.GTLD-SERVERS.NET  ftp.lip6.fr
Tracing to ftp.lip6.fr[cname] via M.GTLD-SERVERS.NET, maximum of 3 
retries
M.GTLD-SERVERS.NET (2001:0501:b1f9:::::0030) Refers 
backwards


Same output from any of my bind hosts:

# dnstracer -q cname -s 127.0.01  ftp.lip6.fr
Tracing to ftp.lip6.fr[cname] via 127.0.01, maximum of 3 retries
127.0.01 (127.0.0.1) Refers backwards

But interestingly, doing this with www.google.com instead of 
ftp.lip6.fr -only- works on the bind servers with forwarders 
configured. On a test bind host without the forwarders, I get this:


# dnstracer -q cname -s 127.0.01  www.google.com
Tracing to www.google.com[cname] via 127.0.01, maximum of 3 retries
127.0.01 (127.0.0.1) Refers backwards

Vincent

On Thu, 1 May 2025, Carlos Horowicz via bind-users wrote:



Hi,

For SERVFAIL to happen, ALL authoritative for the affected domains 
must have been in Datacenters in Spain, Portugal or southern France.


I live in Spain, and as 12:33 CET I lost not only power but basic 
telephony, cellular telephony and cellular data. Everything. Power 
generators were only good for keeping power
locally at Datacenters or Hospitals, but they were isolated from each 
other.


The mitigation began at around 2-3pm CET , as they were turning up 
different power plants one at a time and connecting it to the power 
network, and it took them more than 12

hours to turn everything up.

So may be that was the reason, if it coincides with your perception 
... dnstracer has eventually helped me find lame delegations.


Carlos Horowicz
Planisys

On 01/05/2025 17:23, Rob McEwen via bind-users wrote:
  From vinc...@cojot.name
  until a few days ago (April 28th?) when the amount of SERVFAIL 
started going ballistic and started preventing the resolution of a 
lot of DNS names on the

  internet to the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the 
MASSIVE and many-hours-long power outages in Europe, mainly in Spain 
and Portugal. That started on
April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and 
the majority of it lasted almot 24 hours.


https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal 



Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement





--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC f

Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Michael Richardson

vinc...@cojot.name wrote:
> I've been running bind (since bind 4) in my home/lab for around 3 
decades. I
> went that route because I wanted: A) to be abstracted from the ISP's DNS
> servers and B) to have a local DNS cache.

Ditto.

> I run 'bind9.16' on RHEL/Linux using the default bundled root zone
> (/var/named/named.ca) and the default root key (/etc/named.root.key).

And you have DNSSEC validation turned on?

> This has worked well for years until a few days ago (April 28th?) when the
> amount of SERVFAIL started going ballistic and started preventing the
> resolution of a lot of DNS names on the internet to the point where DNS 
was
> unusable..

> Starting April 28th, I started seeing tons of things like this in the auth
> log:
> 28-Apr-2025 00:13:03.714 lame-servers: info: SERVFAIL unexpected RCODE
> resolving 'github.com/A/IN': 198.51.44.8#53
> 28-Apr-2025 00:13:03.720 lame-servers: info: SERVFAIL unexpected RCODE
> resolving 'github.com/A/IN': 205.251.197.3#53
> 28-Apr-2025 00:13:03.725 lame-servers: info: SERVFAIL unexpected RCODE
> resolving 'github.com/A/IN': 198.51.45.8#53

All of these look like valid DNS servers for github.
Of course, AWS/GITHUB can't be bothered to do DNSSEC.
From my vantage point, they all seem to resolve github.com

My guess is that something is in the way, and it's probably trying to
attack you (or your ISP) with fake replies, but it's doing a bad job.

When I do:
 dig +short +nsid version.bind. txt ch dns4.p08.nsone.net

I get:
  "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian"

If you get something different, then that would be consistent with something
else intercepting your traffic.

> I struggled for a couple days to bring my DNS servers back into service 
and
> the -ONLY- thing which worked was to declare some 'forwarders' (Google +
> CloudFlare). Nothing else brought reliable DNS service back.

:-(
But that does suggest that something else is in the way.
Did you forward with Do53, or did you use DoT/DoH?
{No idea if bind can forward over DoH, I never tried}

> - I tried to turn off dnssec completely but that barely made a difference:

> dnssec-enable no;
> dnssec-validation no;

Won't matter, since github doesn't do DNSSEC, so the NXDOMAINs can't be
validated (or rejected as invalid)

> The only way to get back to a working state is to add back some 
forwarders.

> Any ideas? Am I doing anything wrong? I'm attaching a sanitized copy of my
> named.conf in case someone could spot something:

I think you did everything right.
I think talking to your upstream ISP is in order.

> Thank you for your attention,

> 
,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,
> Vincent S. Cojot, Computer Engineering. STEP project. 
_.,-*~'`^`'~*-,._.,-*~
> Ecole Polytechnique de Montreal, Comite Micro-Informatique. 
_.,-*~'`^`'~*-,.

Bonjour!
Elbows Up.



--
Michael Richardson. o O ( IPv6 IøT consulting )
   Sandelman Software Works Inc, Ottawa and Worldwide






signature.asc
Description: PGP signature
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Michael Richardson

Rob McEwen via bind-users  wrote:
> I strongly suspect that this was caused (even if indirectly?) by the 
MASSIVE
> and many-hours-long power outages in Europe, mainly in Spain and
> Portugal. That started on April 28, 2025, at approximately 6:33 a.m. 
Eastern
> Time (ET) - and the majority of it lasted almot 24 hours.

I can't see how this would affect the massive anycast service that github
gets from awsdns and nsone.

Vincent is, I think, in Montreal.



signature.asc
Description: PGP signature
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent


Hi everyone,

I've been running bind (since bind 4) in my home/lab for around 3 decades. 
I went that route because I wanted: A) to be abstracted from the ISP's DNS 
servers and B) to have a local DNS cache.


Fast forward to today and I have several servers, each with a copy of my 
zones and the same config files, all are NS for my zones and there are 
floating IPs (controlled by some clustering software) which float across 
my servers and to which the DNS clients point to.


My DNS servers are 1) recursive, 2) authoritative for my zones and 3) 
provide extra stuff (like being able to point to my company's DNS servers 
when the VPN is up and reachable so my bind daemons can accumulate a cache 
of DNS entries).


I run 'bind9.16' on RHEL/Linux using the default bundled root zone 
(/var/named/named.ca) and the default root key (/etc/named.root.key).


This has worked well for years until a few days ago (April 28th?) when 
the amount of SERVFAIL started going ballistic and started preventing the 
resolution of a lot of DNS names on the internet to the point where DNS 
was unusable..


# grep -c SERVFAIL auth_servers.log*
auth_servers.log:34245
auth_servers.log.0:236802
auth_servers.log.1:225409
auth_servers.log.2:226233
auth_servers.log.3:224762
auth_servers.log.4:242299
auth_servers.log.5:214953 < April 28th and beyond
auth_servers.log.6:1207
auth_servers.log.7:281

Starting April 28th, I started seeing tons of things like this in the auth 
log:

28-Apr-2025 00:13:03.714 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.44.8#53
28-Apr-2025 00:13:03.720 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.197.3#53
28-Apr-2025 00:13:03.725 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.45.8#53
28-Apr-2025 00:13:03.730 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 198.51.44.72#53
28-Apr-2025 00:13:03.735 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.198.171#53
28-Apr-2025 00:13:03.740 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.193.165#53
28-Apr-2025 00:13:03.745 lame-servers: info: SERVFAIL unexpected RCODE 
resolving 'github.com/A/IN': 205.251.194.8#53

I struggled for a couple days to bring my DNS servers back into service 
and the -ONLY- thing which worked was to declare some 'forwarders' (Google 
+ CloudFlare). Nothing else brought reliable DNS service back.


Here is a list of what I tried and did -NOT- yield satisfactory results.

- RHEL only had the 20326 KSK in /etc/named.root.key so I updated it with 
the file from upstream :

(https://raw.githubusercontent.com/isc-projects/bind9/refs/heads/main/bind.keys)

- I tried to turn off dnssec completely but that barely made a difference:

dnssec-enable no;
dnssec-validation no;

- I tried to switch off IPV6 resolution:
query-source-v6 port 0; // disables IPv6 queries
prefer-ipv4 yes;

In the end, the -only- solution which brought back working DNS resolution 
was this:

forwarders {
1.0.0.1;
1.1.1.1;
8.8.8.8;
8.8.4.4;
};

I am not a DNS administrator and I have little clue as to if I am doing 
something slightly wrong or very wrong. Does anyone have any idea why all 
this starting happening to at the end of April? I reproduced the issue on 
multiple RHEL systems across the Internet (RHEL 9.4 and 9.5, in EMEA and 
Canada).


The symptom looked like this (with ftp.lip6.fr and lip6.fr as examples):
# dig -t dnskey .

; <<>> DiG 9.16.23-RH <<>> -t dnskey .
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52934
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d0e97efe7a229ba2010068138cad7cee5edc6fe3e926 (good)
;; QUESTION SECTION:
;.  IN  DNSKEY

;; ANSWER SECTION:
.   172654  IN  DNSKEY  257 3 8 
AwEAAaz/tAm8yTn4Mfeh5eyI96WSVexTBAvkMgJzkKTOiW1vkIbzxeF3 
+/4RgWOq7HrxRixHlFlExOLAJr5emLvN7SWXgnLh4+B5xQlNVz8Og8kv 
ArMtNROxVQuCaSnIDdD5LKyWbRd2n9WGe2R8PzgCmr3EgVLrjyBxWezF 
0jLHwVN8efS3rCj/EWgvIWgb9tarpVUDK/b58Da+sqqls3eNbuv7pr+e 
oZG+SrDK6nWeL3c6H5Apxz7LjVc1uTIdsIXxuOLYA4/ilBmSVIzuDWfd 
RUfhHdY6+cn8HFRm+2hM8AnXGXws9555KrUB5qihylGa8subX2Nn6UwN R1AkUTV74bU=
.   172654  IN  DNSKEY  256 3 8 
AwEAAbEbGCpGTDrcZTWqWWE72nphyshpRcILdzCVlBGU9Ln1Fui9kkse 
UOP+g5GLUeVFKdTloeRTA9+EYiQdXgWXmXmuW/nGxZjAikluF/O9NzLV 
rr5iZnth2xu+F48nrJlAgWWiMNau54NI5sZ3iVQfhFsq2pZmf43RauRP 
niYMShOLO7EBWWXr5glDSgZGS9fSm6xHwwF+g8D4m8oanjvdCBNxXzSE 
KS31ibxjLifTfvwCg3y4XXcNW9U6Nu3JmoKUdxqpPPIkBvVQbIz4UO2F 
waR13uXC03ALP1Yx2QNSS4SZlcIMtAftQR9wtCiuPWQnFv4jkzWqlhp1 Lmf7bcoL9yk=
.   172654  IN  DNSKEY  257 3 8 
AwEAAa96jeuknZlaeSrvyAJj6ZHv28hhOKkx3rLG

Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Rob McEwen via bind-users

From vinc...@cojot.name
until a few days ago (April 28th?) when the amount of SERVFAIL started going 
ballistic and started preventing the resolution of a lot of DNS names on the 
internet to the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the 
MASSIVE and many-hours-long power outages in Europe, mainly in Spain and 
Portugal. That started on April 28, 2025, at approximately 6:33 a.m. 
Eastern Time (ET) - and the majority of it lasted almot 24 hours.


https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal

Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent



Hi Rob,

Unfortunately, as soon as I remove the 'forwarders' in any of my named 
servers, the problem comes back. The output in my previous message was 
captured just a few minutes ago after I had disabled 'forwaders' in one of 
my bind servers.


Regards,

Vincent


On Thu, 1 May 2025, Rob McEwen wrote:


From vinc...@cojot.name
  until a few days ago (April 28th?) when the amount of SERVFAIL started 
going ballistic and started preventing the resolution of a lot of DNS names on 
the internet to
  the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the MASSIVE 
and many-hours-long power outages in Europe, mainly in Spain and Portugal. That 
started on April 28,
2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it 
lasted almot 24 hours.

https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal

Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement




--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Rob McEwen via bind-users
In that case, someone smarter and more knowledgeable on this list will 
hopefully help you. But first - one last suggestion - if you find that 
forwards to 3rd party servers work - but turning those off causes issues 
- you should probably make sure that your "root hints" are updated, and 
purge any caching (rndc flush), then restart BIND. Maybe you've already 
done that? But if not, it's worth a try before digging deeper.


If that doesn't fix this, then hopefully someone else on this list can 
help you.


Rob McEwen, invaluement



-- Original Message --

From vinc...@cojot.name

To "Rob McEwen" 
Cc bind-users@lists.isc.org
Date 5/1/2025 11:28:23 AM
Subject Re: Massive increase of SERVFAIL after April 28th 2025.


Hi Rob,

Unfortunately, as soon as I remove the 'forwarders' in any of my named servers, 
the problem comes back. The output in my previous message was captured just a 
few minutes ago after I had disabled 'forwaders' in one of my bind servers.

Regards,

Vincent


On Thu, 1 May 2025, Rob McEwen wrote:


From vinc...@cojot.name
  until a few days ago (April 28th?) when the amount of SERVFAIL started 
going ballistic and started preventing the resolution of a lot of DNS names on 
the internet to
  the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the MASSIVE 
and many-hours-long power outages in Europe, mainly in Spain and Portugal. That 
started on April 28,
2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it 
lasted almot 24 hours.

https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal

Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Ondřej Surý
> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net

This needs to be this: ^^^

You missed @ and thus you asked your local resolver.

Ondrej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 1. 5. 2025, at 20:21, Michael Richardson  wrote:
> 
> dig +short +nsid version.bind. txt ch dns4.p08.nsone.net
> 
> I get:
>  "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian"

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent



Hi Michael,

Thank you so much for chiming in!


My guess is that something is in the way, and it's probably trying to
attack you (or your ISP) with fake replies, but it's doing a bad job.

When I do:
dig +short +nsid version.bind. txt ch dns4.p08.nsone.net

I get:
 "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian"


Spot on! Here's what I get:

# dig +short +nsid version.bind. txt ch dns4.p08.nsone.net
"9.16.23-RH"
198.51.45.72

Free.fr is my ISP but "9.16.23-RH" suspiciously looks like the bind 
version I'm running on RHEL9:


# rndc status|grep version
version: BIND 9.16.23-RH (Extended Support Version) 


If you get something different, then that would be consistent with something
else intercepting your traffic.


Could my DNS servers be doing this to themselves?


:-(
But that does suggest that something else is in the way.
Did you forward with Do53, or did you use DoT/DoH?
{No idea if bind can forward over DoH, I never tried}

   > - I tried to turn off dnssec completely but that barely made a difference:

   > dnssec-enable no;
   > dnssec-validation no;

Won't matter, since github doesn't do DNSSEC, so the NXDOMAINs can't be
validated (or rejected as invalid)

   > The only way to get back to a working state is to add back some forwarders.

   > Any ideas? Am I doing anything wrong? I'm attaching a sanitized copy of my
   > named.conf in case someone could spot something:

I think you did everything right.
I think talking to your upstream ISP is in order.


Thank you!

Vincent
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent



Hi Carlos,

First of all, I'd like to say how sorry I was for those affected, as I was 
watching the events unfold down south.


I've rebuilt dnstracer for RHEL9 and I don't really understand what's 
going on here.. Here's the output for ftp.lip6.fr:


# dnstracer -q cname -s M.GTLD-SERVERS.NET  ftp.lip6.fr
Tracing to ftp.lip6.fr[cname] via M.GTLD-SERVERS.NET, maximum of 3 retries
M.GTLD-SERVERS.NET (2001:0501:b1f9:::::0030) Refers backwards

Same output from any of my bind hosts:

# dnstracer -q cname -s 127.0.01  ftp.lip6.fr
Tracing to ftp.lip6.fr[cname] via 127.0.01, maximum of 3 retries
127.0.01 (127.0.0.1) Refers backwards

But interestingly, doing this with www.google.com instead of ftp.lip6.fr 
-only- works on the bind servers with forwarders configured. On a test 
bind host without the forwarders, I get this:


# dnstracer -q cname -s 127.0.01  www.google.com
Tracing to www.google.com[cname] via 127.0.01, maximum of 3 retries
127.0.01 (127.0.0.1) Refers backwards

Vincent

On Thu, 1 May 2025, Carlos Horowicz via bind-users wrote:



Hi,

For SERVFAIL to happen, ALL authoritative for the affected domains must have 
been in Datacenters in Spain, Portugal or southern France.

I live in Spain, and as 12:33 CET I lost not only power but basic telephony, 
cellular telephony and cellular data. Everything. Power generators were only 
good for keeping power
locally at Datacenters or Hospitals, but they were isolated from each other.

The mitigation began at around 2-3pm CET , as they were turning up different 
power plants one at a time and connecting it to the power network, and it took 
them more than 12
hours to turn everything up.

So may be that was the reason, if it coincides with your perception ... 
dnstracer has eventually helped me find lame delegations.

Carlos Horowicz
Planisys

On 01/05/2025 17:23, Rob McEwen via bind-users wrote:
  From vinc...@cojot.name
  until a few days ago (April 28th?) when the amount of SERVFAIL started 
going ballistic and started preventing the resolution of a lot of DNS names on 
the
  internet to the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the MASSIVE 
and many-hours-long power outages in Europe, mainly in Spain and Portugal. That 
started on
April 28, 2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority 
of it lasted almot 24 hours.

https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal

Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement





--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Jeremy C. Reed
> /var/log/named/auth_servers.log:01-May-2025 11:05:26.694 lame-servers: info:
> SERVFAIL unexpected RCODE resolving 'isis.lip6.fr//IN': 193.51.24.1#53

do some queries for these many examples, like

dig @193.51.24.1 isis.lip6.fr 

dig @132.227.60.2 osiris.lip6.fr 

dig +norec @198.51.44.72 github.com

from the same system you are running named
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Carlos Horowicz via bind-users

Hi,

For SERVFAIL to happen, ALL authoritative for the affected domains must 
have been in Datacenters in Spain, Portugal or southern France.


I live in Spain, and as 12:33 CET I lost not only power but basic 
telephony, cellular telephony and cellular data. Everything. Power 
generators were only good for keeping power locally at Datacenters or 
Hospitals, but they were isolated from each other.


The mitigation began at around 2-3pm CET , as they were turning up 
different power plants one at a time and connecting it to the power 
network, and it took them more than 12 hours to turn everything up.


So may be that was the reason, if it coincides with your perception ... 
dnstracer has eventually helped me find lame delegations.


Carlos Horowicz
Planisys

On 01/05/2025 17:23, Rob McEwen via bind-users wrote:

From vinc...@cojot.name
until a few days ago (April 28th?) when the amount of SERVFAIL 
started going ballistic and started preventing the resolution of a 
lot of DNS names on the internet to the point where DNS was unusable


I strongly suspect that this was caused (even if indirectly?) by the 
MASSIVE and many-hours-long power outages in Europe, mainly in Spain 
and Portugal. That started on April 28, 2025, at approximately 6:33 
a.m. Eastern Time (ET) - and the majority of it lasted almot 24 hours.


https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal

Hopefully, you're not seeing any more of these errors now?

Rob McEwen, invaluement

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent


Hi Rob,
Thank you for your message. Yes, I've already done all that. (got the 
latest root zone, restart named each time I switch from forwarders 
to non-forwarders, etc...).
I am using lip6.fr as an example because it hosts some mirrors for Fedora 
Linux but I am pretty sure it's not the only site.


On the other hand, I think you might be right.. on a RHEL9 host in Canada, 
even with the same configuration as here in EMEA, I don't reproduce the 
issue anymore:


# dig -t dnskey lip6.fr.

; <<>> DiG 9.16.23-RH <<>> -t dnskey lip6.fr.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31209
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;lip6.fr.   IN  DNSKEY

;; AUTHORITY SECTION:
lip6.fr.3600IN  SOA osiris.lip6.fr. 
hostmaster.lip6.fr. 2025042900 21600 3600 360 3600

;; Query time: 134 msec
;; SERVER: 213.186.33.99#53(213.186.33.99)
;; WHEN: Thu May 01 15:44:14 UTC 2025
;; MSG SIZE  rcvd: 90


,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,
Vincent S. Cojot, Computer Engineering. STEP project. _.,-*~'`^`'~*-,._.,-*~
Ecole Polytechnique de Montreal, Comite Micro-Informatique. _.,-*~'`^`'~*-,.
Linux Xview/OpenLook resources page _.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'
http://step.polymtl.ca/~coyote  _.,-*~'`^`'~*-,._ coy...@nospam4cojot.name

They cannot scare me with their empty spaces
Between stars - on stars where no human race is
I have it in me so much nearer home
To scare myself with my own desert places.   - Robert Frost



On Thu, 1 May 2025, Rob McEwen wrote:


In that case, someone smarter and more knowledgeable on this list will 
hopefully help you. But first - one last suggestion - if you find that forwards 
to 3rd party servers work
- but turning those off causes issues - you should probably make sure that your 
"root hints" are updated, and purge any caching (rndc flush), then restart 
BIND. Maybe you've
already done that? But if not, it's worth a try before digging deeper.

If that doesn't fix this, then hopefully someone else on this list can help you.

Rob McEwen, invaluement



-- Original Message --
From vinc...@cojot.name
To "Rob McEwen" 
Cc bind-users@lists.isc.org
Date 5/1/2025 11:28:23 AM
Subject Re: Massive increase of SERVFAIL after April 28th 2025.

  Hi Rob,
 
Unfortunately, as soon as I remove the 'forwarders' in any of my named servers, 
the problem comes back. The output in my previous message was captured just a 
few minutes
ago after I had disabled 'forwaders' in one of my bind servers.
 
Regards,
 
Vincent
 
 
On Thu, 1 May 2025, Rob McEwen wrote:
 
  From vinc...@cojot.name
until a few days ago (April 28th?) when the amount of SERVFAIL started going 
ballistic and started preventing the resolution of a lot of DNS names on the 
internet to
the point where DNS was unusable
 
 
I strongly suspect that this was caused (even if indirectly?) by the MASSIVE 
and many-hours-long power outages in Europe, mainly in Spain and Portugal. That 
started
on April 28,
2025, at approximately 6:33 a.m. Eastern Time (ET) - and the majority of it 
lasted almot 24 hours.
 
https://www.france24.com/en/europe/20250430-what-we-know-so-far-about-the-massive-blackout-that-hit-spain-and-portugal
 
Hopefully, you're not seeing any more of these errors now?
 
Rob McEwen, invaluement
 
 
 


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent



Hi again Carlos,

I really don't understand how it works for you and not for me on a RHEL 
host in Canada. Here's what I was trying with 8.8.8.8 and 1.1.1.1:


# dnstracer -o -e -s 8.8.8.8 ftp.lip6.fr
Tracing to ftp.lip6.fr[a] via 8.8.8.8, maximum of 3 retries
8.8.8.8 (8.8.8.8)

# dnstracer -o -e -s 1.0.0.1 ftp.lip6.fr
Tracing to ftp.lip6.fr[a] via 1.0.0.1, maximum of 3 retries
1.0.0.1 (1.0.0.1)

# dnstracer -o -s 1.1.1.1 ftp.lip6.fr
Tracing to ftp.lip6.fr[a] via 1.1.1.1, maximum of 3 retries
1.1.1.1 (1.1.1.1)

Vincent
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent



On Thu, 1 May 2025, Michael Richardson wrote:



Rob McEwen via bind-users  wrote:
   > I strongly suspect that this was caused (even if indirectly?) by the 
MASSIVE
   > and many-hours-long power outages in Europe, mainly in Spain and
   > Portugal. That started on April 28, 2025, at approximately 6:33 a.m. 
Eastern
   > Time (ET) - and the majority of it lasted almot 24 hours.

I can't see how this would affect the massive anycast service that github
gets from awsdns and nsone.

Vincent is, I think, in Montreal.


I -was- (until the summer of 2024 and I'm now in France) :)
Regards,




--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread vincent


Thank you, Ondřej!

I'm getting the same answer from all my hosts:

# dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net
"366568643ba5103a1f441fbc3c502ed2eaa0b3d9"

Vincent


On Thu, 1 May 2025, Ondřej Surý wrote:


dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net


This needs to be this: ^^^

You missed @ and thus you asked your local resolver.

Ondrej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.


On 1. 5. 2025, at 20:21, Michael Richardson  wrote:

dig +short +nsid version.bind. txt ch dns4.p08.nsone.net

I get:
 "9.21.2-1+0~20241120.131+debian12~1.gbpa6576d-Debian"


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Michael Richardson

Ondřej Surý  wrote:
>> dig +short +nsid version.bind. txt ch @dns4.p08.nsone.net

> This needs to be this: ^^^

p> You missed @ and thus you asked your local resolver.

Yes, you are right. Bad on me
I actually have a script that does this, but I transcribed it for posting.

I get:

obiwan-[~](2.6.6) mcr 10194 % dig +short +nsid version.bind. txt ch 
@dns4.p08.nsone.net
"366568643ba5103a1f441fbc3c502ed2eaa0b3d9"



signature.asc
Description: PGP signature
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Massive increase of SERVFAIL after April 28th 2025.

2025-05-01 Thread Mark Andrews

I don’t think there was anything wrong with your servers.  The log messages
indicate problems with the authoritative servers.

In theory authoritative DNS servers should be able to serve content until the
zone content expires.  They should be able to be powered off then rebooted and
continue to serve the content they had provided it hasn’t expired in the 
meantime
by reading it off local disc drives.  This obviously has not happened for the
zones you mentioned.  The servers appear to have come up without access to the
zone content they are supposed to serve and are hence returning SERVFAIL.

Forwarding to the servers you are is providing indirect access to instances with
zone content to serve.

Mark
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742  INTERNET: ma...@isc.org

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users