Recently (2024/9/21) I ran into an issue that might be similar. Due to DDoS attacks that use complicated lookups to make DNS servers do extra work, to slow them down, some recent DNS server software has tightened the amount of 'work' that it will do on a single query before giving up and returning SERVFAIL. In my case I had spread out my NS records over several domains, and each of those domains depended on yet more domains. This was designed to increase resilience by not depending on a single domain. But we began to get random failures, in our case when trying to get an SSL Certificate, LetEncrypt using Unbound was verifying every NS record and sometimes gave up, with an error message "exceeded the maximum nameserver nxdomains" even though there were no 'nxdomains' in the log. I simplified my NS records and the problem went away.
-- Bob Harold On Fri, Sep 6, 2024 at 11:36 AM Peter <p...@citylink.dinoex.sub.org> wrote: > On Fri, Sep 06, 2024 at 08:18:52AM +1000, Mark Andrews wrote: > ! Well from here all the IPv4 addresses for the tel.t-online.de > ! servers are not responding. > > Wait - which IPv4 addresses? AFAIK that thing doesn't have any > addresses, it is only used for NAPTR queries. > > ! That won’t be helping things. Also the servers are generating invalid > negative responses. > ! The SOA record in the response is the QNAME rather than the owner of > ! the zone. > > Wow. Interesting. > > ! Also waiting > ! an hour to retry on SERVFAIL is ridiculous. > > Yes, agreed. But this device is a piece of physical hardware and > commercially available from Alcatel; so this is what one encounters in > the field. (That's why I usually prefer to design or at least compile > stuff myself, so I can fix things.) > > ! What you haven’t shown is the communication between the recursive server > and the authoritative > ! servers. > ! > ! tcpdump -w trace.pcap port 53 and \( host ns1.edns.t-ipnet.de or > ! ns2.edns.t-ipnet.de or ns3.edns.t-ipnet.de or ns4.edns.t-ipnet.de or > ! ns5.edns.t-ipnet.de \) > > There is none. > SERVFAIL is sent before we even get that far: > > intra | 31.08.2024 06:12:10.646279 CEST | 31.08.2024 06:12:10.673001 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns1.edns.t-ipnet.de. IN AAAA > intra | 31.08.2024 06:12:10.646279 CEST | 31.08.2024 06:12:10.673001 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns1.edns.t-ipnet.de. 86400 IN AAAA 2003:180:8::53 > intra | 31.08.2024 06:12:10.64797 CEST | 31.08.2024 06:12:10.674063 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns2.edns.t-ipnet.de. IN AAAA > intra | 31.08.2024 06:12:10.64797 CEST | 31.08.2024 06:12:10.674063 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns2.edns.t-ipnet.de. 86400 IN AAAA > 2003:180:8:100::53 > intra | 31.08.2024 06:12:10.644626 CEST | 31.08.2024 06:12:10.674381 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns2.edns.t-ipnet.de. IN A > intra | 31.08.2024 06:12:10.644626 CEST | 31.08.2024 06:12:10.674381 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns2.edns.t-ipnet.de. 86400 IN A 212.185.255.217 > intra | 31.08.2024 06:12:10.642914 CEST | 31.08.2024 06:12:10.674887 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns5.edns.t-ipnet.de. IN AAAA > intra | 31.08.2024 06:12:10.642914 CEST | 31.08.2024 06:12:10.674887 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns5.edns.t-ipnet.de. 86400 IN AAAA > 2003:180:8:400::53 > intra | 31.08.2024 06:12:10.651237 CEST | 31.08.2024 06:12:10.675469 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns4.edns.t-ipnet.de. IN A > intra | 31.08.2024 06:12:10.651237 CEST | 31.08.2024 06:12:10.675469 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns4.edns.t-ipnet.de. 86400 IN A 212.185.255.233 > intra | 31.08.2024 06:12:10.52171 CEST | 31.08.2024 06:12:10.681361 > CEST | | | CLIENT_RESPONSE | QUESTION | > SERVFAIL | qr rd ra | _sip._udp.tel.t-online.de. IN SRV > *intra | 31.08.2024 06:12:10.681672 CEST | 31.08.2024 06:12:10.699011 > CEST | :: | 2003:180:8:100::53 | RESOLVER_QUERY | QUESTION | > | | ns6.edns.t-ipnet.de. IN A > *intra | 31.08.2024 06:12:10.684058 CEST | 31.08.2024 06:12:10.698688 > CEST | :: | 2003:180:8:100::53 | RESOLVER_QUERY | QUESTION | > | | ns6.edns.t-ipnet.de. IN AAAA > intra | 31.08.2024 06:12:10.649577 CEST | 31.08.2024 06:12:10.684556 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | QUESTION | > NOERROR | qr aa cd | ns1.edns.t-ipnet.de. IN A > intra | 31.08.2024 06:12:10.649577 CEST | 31.08.2024 06:12:10.684556 > CEST | :: | 2003:180:4:a::1:53 | RESOLVER_RESPONSE | ANSWER | > NOERROR | qr aa cd | ns1.edns.t-ipnet.de. 86400 IN A 212.185.255.209 > > ! SERVFAIL is just the way the recursive server tells the client that it > couldn’t get the answer. > > Yeah. But I cannot yet see the point where (and why!) things actually > fail. > > ! We have no idea of what else as changed between April and now. > > No, we haven't. > But then also, DTAG aka ASN-3320 is (a nuisance but) something bigger, > who serve probably 10 million telephony customers - I would > consider it somehow unlikely that things do severely fail on their side. > > ! % dig srv _sip._udp.tel.t-online.de @ns4.edns.t-ipnet.de +nocookie > ! ;; communications error to 212.185.255.233#53: timed out > ! ;; communications error to 212.185.255.233#53: timed out > ! ;; communications error to 212.185.255.233#53: timed out > > "Nuisance" above means, horrible peering policy (among other things). > Depending on where You are uplinked, timeouts are quite normal. > Here on their customer uplink I don't get these. > > And the SERVFAIL seems to appear before even getting that far. > > I have a nice example from yesterday, where I got this in the log > (I've attached the full server activity onto the mail). It just > ceases resolving, with no obvious reason. > > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns1.edns.t-ipnet.de/AAAA' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns2.edns.t-ipnet.de/A' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns2.edns.t-ipnet.de/AAAA' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns3.edns.t-ipnet.de/A' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns3.edns.t-ipnet.de/AAAA' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns4.edns.t-ipnet.de/A' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns4.edns.t-ipnet.de/AAAA' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns5.edns.t-ipnet.de/A' > Sep 5 13:21:40 <local1.info> conr named[4456]: resolver: info: loop > detected resolving 'ns5.edns.t-ipnet.de/AAAA' > Sep 5 13:21:40 <local1.info> conr named[4456]: query-errors: info: > client @0x87d9fc160 192.168.97.23#3099 (tel.t-online.de): view intra: > query failed (failure) for tel.t-online.de/IN/NAPTR at query.c:7836 > -- > Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > ISC funds the development of this software with paid support > subscriptions. Contact us at https://www.isc.org/contact/ for more > information. > > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users >
-- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users