Hi Brian,

here are some hints what can you do to get more information of of the running 
`named` process:

https://kb.isc.org/docs/aa-00341

Basically pstack / eu-stack and or gcore (or attaching gdb). It's important to 
have debugging symbols present, without them it's virtually impossible to debug 
the issue.

I would suggest you fill an issue in our GitLab (gitlab.isc.org 
<http://gitlab.isc.org/>) and we can continue there.

Also please include the information about previous BIND 9 version.

Ondrej
--
Ondřej Surý (He/Him)
ond...@isc.org

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 25. 7. 2024, at 12:02, Sebby, Brian A. via bind-users 
> <bind-users@lists.isc.org> wrote:
> 
> I upgraded our DNS servers when the 9.18.28 release came out, and ran into a 
> problem today that I wanted to know if anyone else had seen or had any 
> suggestions about how to debug.
>  We have our DNS configured in a hidden primary configuration, where the 
> primary has internal and external views and serves and internal and external 
> copy of one of our domains.  The external version is fairly small, while the 
> internal version is significantly larger.  We use the same DNSSEC keys to 
> sign both versions of the domain.  Every once in a while, we have encountered 
> an issue where the unsigned and signed versions of the domain get out of 
> sync, which causes this message to appear in our logs (note that I have 
> modified all of the following log entries to replace our domain with 
> example.org):
>  25-Jul-2024 10:12:32.202 general: error: zone example.org/IN/internal 
> (signed): receive_secure_serial: not exact
>  The solution I’ve always been able to follow previously is to comment out 
> the DNSSEC config options in named.conf, restart named with the zone 
> unsigned, retransfer the unsigned zone to our secondaries, and then put back 
> the DNSSEC config options, restart named, and let it re-sign the zone.  It 
> takes a little bit, but normally everything has then gotten back to normal.
>  Today, however, when I tried to do that, it started to sign the zone – and 
> then named just hung.  It stopped updating any of the log files, stopped 
> sending any notifies, and stopped returning DNS data of any sort.  When I 
> tried to restart named via systemctl it had to kill the process because named 
> would not respond.  I was able to undo the DNSSEC changes, restart named, and 
> it continued to work.  I tried it again, and named hung once again in the 
> middle of signing the zone.  Throughout all of these restarts, the signed 
> version of the external zone continued to work normally.
>  This is frustrating because when named hangs, there are no error messages in 
> the logs that I can see, and no indication of why it has failed.   If I try 
> running rndc commands locally I get this error:
>  rndc: recv failed: timed out
>  Remote servers show a timeout and then I saw this in some of their transfer 
> logs:
>  25-Jul-2024 10:27:01.827 general: info: zone example.org/IN: refresh: 
> skipping zone transfer as primary A.B.C.D#53 (source E.F.G.H#0) is 
> unreachable (cached)
>  I was able to solve that one by sending notifies from the primary after 
> restarting it without DNSSEC, but I really need to get DNSSEC working again.
>  The configuration for the zone in named.conf is (and yes, I know I need to 
> update to dnssec-policy):
>  view "internal" {
> ...
>         zone "example.org" {
>                 type primary;
>                 file "/path/to/internal/example.org";
>                key-directory "/path/to/keys";
>                auto-dnssec maintain;
>                inline-signing yes;
>         };
> ...
> };
>  Does anyone have any suggestions for putting named into a debug mode to try 
> to get more data if it hangs again?  I was thinking of turning the DNSSEC 
> options back on but setting “notify no” so it didn’t try to notify the 
> secondaries in case all of the notifies and zone transfers going on while it 
> was signing was part of the problem.
>  The memory and CPU resources of the system should be sufficient – it’s got 2 
> virtual CPUs and 8GB of memory, but it’s not close to using up the memory, 
> and since it doesn’t have clients, the CPU has never been an issue before.  I 
> tried replicating this issue on our test server but it managed to sign the 
> zone with no problems – though it doesn’t have as many clients.
>  I don’t think the new max-records-per-type or max-types-per-name options are 
> involved as we don’t have any cases where we have that many records with the 
> same name.
>   Thanks,
>  Brian
>  -- 
> Brian Sebby (he/him/his)      |  Lead Systems Engineer
> Email: se...@anl.gov          |  Information Technology Infrastructure
> Phone: +1 630.252.9935        |  Business Information Services
> Cell:  +1 630.921.4305        |  Argonne National Laboratory
> -- 
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
> this list
> 
> ISC funds the development of this software with paid support subscriptions. 
> Contact us at https://www.isc.org/contact/ for more information.
> 
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to