I upgraded our DNS servers when the 9.18.28 release came out, and ran into a 
problem today that I wanted to know if anyone else had seen or had any 
suggestions about how to debug.

We have our DNS configured in a hidden primary configuration, where the primary 
has internal and external views and serves and internal and external copy of 
one of our domains.  The external version is fairly small, while the internal 
version is significantly larger.  We use the same DNSSEC keys to sign both 
versions of the domain.  Every once in a while, we have encountered an issue 
where the unsigned and signed versions of the domain get out of sync, which 
causes this message to appear in our logs (note that I have modified all of the 
following log entries to replace our domain with example.org):

25-Jul-2024 10:12:32.202 general: error: zone example.org/IN/internal (signed): 
receive_secure_serial: not exact

The solution I’ve always been able to follow previously is to comment out the 
DNSSEC config options in named.conf, restart named with the zone unsigned, 
retransfer the unsigned zone to our secondaries, and then put back the DNSSEC 
config options, restart named, and let it re-sign the zone.  It takes a little 
bit, but normally everything has then gotten back to normal.

Today, however, when I tried to do that, it started to sign the zone – and then 
named just hung.  It stopped updating any of the log files, stopped sending any 
notifies, and stopped returning DNS data of any sort.  When I tried to restart 
named via systemctl it had to kill the process because named would not respond. 
 I was able to undo the DNSSEC changes, restart named, and it continued to 
work.  I tried it again, and named hung once again in the middle of signing the 
zone.  Throughout all of these restarts, the signed version of the external 
zone continued to work normally.

This is frustrating because when named hangs, there are no error messages in 
the logs that I can see, and no indication of why it has failed.   If I try 
running rndc commands locally I get this error:

rndc: recv failed: timed out

Remote servers show a timeout and then I saw this in some of their transfer 
logs:

25-Jul-2024 10:27:01.827 general: info: zone example.org/IN: refresh: skipping 
zone transfer as primary A.B.C.D#53 (source E.F.G.H#0) is unreachable (cached)

I was able to solve that one by sending notifies from the primary after 
restarting it without DNSSEC, but I really need to get DNSSEC working again.

The configuration for the zone in named.conf is (and yes, I know I need to 
update to dnssec-policy):

view "internal" {
...
        zone "example.org" {
                type primary;
                file "/path/to/internal/example.org";
               key-directory "/path/to/keys";
               auto-dnssec maintain;
               inline-signing yes;
        };
...
};

Does anyone have any suggestions for putting named into a debug mode to try to 
get more data if it hangs again?  I was thinking of turning the DNSSEC options 
back on but setting “notify no” so it didn’t try to notify the secondaries in 
case all of the notifies and zone transfers going on while it was signing was 
part of the problem.

The memory and CPU resources of the system should be sufficient – it’s got 2 
virtual CPUs and 8GB of memory, but it’s not close to using up the memory, and 
since it doesn’t have clients, the CPU has never been an issue before.  I tried 
replicating this issue on our test server but it managed to sign the zone with 
no problems – though it doesn’t have as many clients.

I don’t think the new max-records-per-type or max-types-per-name options are 
involved as we don’t have any cases where we have that many records with the 
same name.


Thanks,

Brian

--
Brian Sebby (he/him/his)      |  Lead Systems Engineer
Email: se...@anl.gov<mailto:se...@anl.gov>          |  Information Technology 
Infrastructure
Phone: +1 630.252.9935        |  Business Information Services
Cell:  +1 630.921.4305        |  Argonne National Laboratory
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to