I upgraded our DNS servers when the 9.18.28 release came out, and ran into a problem today that I wanted to know if anyone else had seen or had any suggestions about how to debug.
We have our DNS configured in a hidden primary configuration, where the primary has internal and external views and serves and internal and external copy of one of our domains. The external version is fairly small, while the internal version is significantly larger. We use the same DNSSEC keys to sign both versions of the domain. Every once in a while, we have encountered an issue where the unsigned and signed versions of the domain get out of sync, which causes this message to appear in our logs (note that I have modified all of the following log entries to replace our domain with example.org): 25-Jul-2024 10:12:32.202 general: error: zone example.org/IN/internal (signed): receive_secure_serial: not exact The solution I’ve always been able to follow previously is to comment out the DNSSEC config options in named.conf, restart named with the zone unsigned, retransfer the unsigned zone to our secondaries, and then put back the DNSSEC config options, restart named, and let it re-sign the zone. It takes a little bit, but normally everything has then gotten back to normal. Today, however, when I tried to do that, it started to sign the zone – and then named just hung. It stopped updating any of the log files, stopped sending any notifies, and stopped returning DNS data of any sort. When I tried to restart named via systemctl it had to kill the process because named would not respond. I was able to undo the DNSSEC changes, restart named, and it continued to work. I tried it again, and named hung once again in the middle of signing the zone. Throughout all of these restarts, the signed version of the external zone continued to work normally. This is frustrating because when named hangs, there are no error messages in the logs that I can see, and no indication of why it has failed. If I try running rndc commands locally I get this error: rndc: recv failed: timed out Remote servers show a timeout and then I saw this in some of their transfer logs: 25-Jul-2024 10:27:01.827 general: info: zone example.org/IN: refresh: skipping zone transfer as primary A.B.C.D#53 (source E.F.G.H#0) is unreachable (cached) I was able to solve that one by sending notifies from the primary after restarting it without DNSSEC, but I really need to get DNSSEC working again. The configuration for the zone in named.conf is (and yes, I know I need to update to dnssec-policy): view "internal" { ... zone "example.org" { type primary; file "/path/to/internal/example.org"; key-directory "/path/to/keys"; auto-dnssec maintain; inline-signing yes; }; ... }; Does anyone have any suggestions for putting named into a debug mode to try to get more data if it hangs again? I was thinking of turning the DNSSEC options back on but setting “notify no” so it didn’t try to notify the secondaries in case all of the notifies and zone transfers going on while it was signing was part of the problem. The memory and CPU resources of the system should be sufficient – it’s got 2 virtual CPUs and 8GB of memory, but it’s not close to using up the memory, and since it doesn’t have clients, the CPU has never been an issue before. I tried replicating this issue on our test server but it managed to sign the zone with no problems – though it doesn’t have as many clients. I don’t think the new max-records-per-type or max-types-per-name options are involved as we don’t have any cases where we have that many records with the same name. Thanks, Brian -- Brian Sebby (he/him/his) | Lead Systems Engineer Email: se...@anl.gov<mailto:se...@anl.gov> | Information Technology Infrastructure Phone: +1 630.252.9935 | Business Information Services Cell: +1 630.921.4305 | Argonne National Laboratory
-- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users