Hi,

I'm running bind version 9.20.6 on artix linux (an arch linux derivate without systemd) with a pretty standard config:

# named -V
BIND 9.20.6 (Stable Release) <id:72cbad0>
running on Linux x86_64 6.13.5-artix1-1 #1 SMP PREEMPT_DYNAMIC Fri, 28 Feb 2025 
10:18:15 +0000
built by make with  '--prefix=/usr' '--sysconfdir=/etc' '--sbindir=/usr/bin' 
'--localstatedir=/var' '--disable-static' '--enable-fixed-rrset' 
'--enable-full-report' '--with-maxminddb' '--with-openssl' '--with-libidn2' 
'--with-json-c' '--with-libxml2' '--with-lmdb' 'CFLAGS=-march=x86-64 
-mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 
-Wformat -Werror=format-security         -fstack-clash-protection 
-fcf-protection -flto=auto -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common 
-Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs 
-flto=auto'
compiled by GCC 14.2.1 20250207
compiled with OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
linked to OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
compiled with libuv version: 1.50.0
linked to libuv version: 1.50.0
compiled with liburcu version: 0.15.0
compiled with jemalloc version: 5.3.0
compiled with libnghttp2 version: 1.64.0
linked to libnghttp2 version: 1.65.0
compiled with libxml2 version: 2.13.5
linked to libxml2 version: 21306-GITv2.13.6
compiled with json-c version: 0.18
linked to json-c version: 0.18
compiled with zlib version: 1.3.1
linked to zlib version: 1.3.1
linked to maxminddb version: 1.12.2
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): no
TKEY mode 3 support (GSS-API): yes

default paths:
  named configuration:  /etc/named.conf
  rndc configuration:   /etc/rndc.conf
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/named.pid
  geoip-directory:      /usr/share/GeoIP


# grep '^\s*[^[:space:]#/]' /etc/named.conf
options {
    directory "/var/named";
    pid-file "/run/named/named.pid";
    allow-recursion { 127.0.0.1; 192.168.188.0/24; };
    allow-transfer { none; };
    allow-update { none; };
    version none;
    hostname none;
    server-id none;
};
zone "localhost" IN {
    type master;
    file "localhost.zone";
};
zone "0.0.127.in-addr.arpa" IN {
    type master;
    file "127.0.0.zone";
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa" 
{
    type master;
    file "localhost.ip6.zone";
};

# pgrep -af named
22958 /usr/bin/named -u named -L /var/log/named.log

Since a few days (or weeks?) now, it started to act up. Every few ten minutes, it crashes with:

10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): 
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in 
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 network: error: creating IPv6 interface veth731351f 
failed; interface ignored
10-Mar-2025 20:33:36.996 network: info: listening on IPv6 interface 
vetha808625, fe80::d0cf:5fff:fe3a:1e50%954915#53
10-Mar-2025 20:33:36.998 network: info: listening on IPv6 interface 
veth92035bc, fe80::58f0:c5ff:fecf:4a8d%954971#53
10-Mar-2025 20:33:37.000 network: info: listening on IPv6 interface 
vethb1ef26b, fe80::58e2:d2ff:fe3f:c77f%955141#53
10-Mar-2025 20:33:37.003 network: info: listening on IPv6 interface 
veth0ee3ea4, fe80::44be:c7ff:fefd:83fb%955153#53
10-Mar-2025 20:33:37.005 network: info: listening on IPv6 interface 
veth39e879e, fe80::34fb:98ff:fe9e:d49f%955162#53
10-Mar-2025 20:33:37.007 network: info: listening on IPv6 interface 
veth2f2d6df, fe80::2c2b:e8ff:fe8e:2339%955167#53
10-Mar-2025 20:33:37.010 network: info: listening on IPv6 interface 
vetha0e2b2b, fe80::84fd:7aff:fe72:9c82%955207#53
10-Mar-2025 20:33:37.012 network: info: listening on IPv6 interface 
vethb633142, fe80::58a5:32ff:feaf:bdb2%955208#53
10-Mar-2025 20:33:37.014 network: info: listening on IPv6 interface 
veth232d291, fe80::f442:a2ff:fe0d:18f8%955383#53
10-Mar-2025 20:33:37.017 network: info: listening on IPv6 interface 
vetha87c0e9, fe80::2431:26ff:fe1e:adac%955384#53
10-Mar-2025 20:33:37.021 network: info: listening on IPv6 interface 
vethadab24f, fe80::7d:44ff:fe11:7284%955606#53
10-Mar-2025 20:33:37.024 network: info: listening on IPv6 interface 
vethe9c8381, fe80::1847:42ff:fe98:cd5c%955655#53
10-Mar-2025 20:33:37.026 network: info: listening on IPv6 interface 
veth5f5869a, fe80::ec06:66ff:fe5d:ef74%955668#53
10-Mar-2025 20:33:37.029 network: info: listening on IPv6 interface 
vethe46d2e1, fe80::f48e:14ff:fe94:2efd%955683#53
10-Mar-2025 20:33:37.032 network: info: listening on IPv6 interface 
vethf87bbe4, fe80::6c0b:47ff:fed2:404d%955686#53
10-Mar-2025 20:33:37.035 network: info: listening on IPv6 interface 
veth207c7ca, fe80::f019:b8ff:feda:517d%955692#53
10-Mar-2025 20:33:37.038 network: info: listening on IPv6 interface 
veth1654fa8, fe80::fc83:fcff:fe79:8f01%955718#53
10-Mar-2025 20:33:37.041 network: info: listening on IPv6 interface 
vethe4e528f, fe80::901d:7fff:fe58:ed2%955719#53
10-Mar-2025 20:33:37.041 general: critical: 
netmgr/udp.c:77:isc__nm_udp_lb_socket(): fatal error:
10-Mar-2025 20:33:37.041 general: critical: RUNTIME_CHECK(result == 
ISC_R_SUCCESS) failed
10-Mar-2025 20:33:37.041 general: critical: exiting (due to fatal error in 
library)

As a first-aid, I added a script to simply restart the nameserver, if it crashes. This showed me two things:

1. If the server crashed, a restart will fail for the next one or two minutes, too.

2. The crashes seem to correlate with the other main load, that I have on this machine: A couple hundred docker containers (each of which apparently setting up a network device on the host system), that are started every ten minutes and run for a few minutes (in rare cases longer). Looking at the minutes of the assertion-logs, there is a clear emphasis on minutes when many containers start(?)/run/stop:

$ grep -F 'RUNTIME_CHECK(result == ISC_R_SUCCESS)' /var/log/named.log | cut -d' 
' -f2 | cut -d: -f2 | cut -c2 | sort | uniq -c
   5976 0
  14767 1
  42850 2
  31292 3
    693 4
    204 5
    199 6
    211 7
    226 8
    198 9

The containers are started via a cronjob:
*/10 * * * *  /home/erich/git/archlinuxewe/build-all-with-docker

In between the crashes, the nameserver seems to run as-expected. Also, the docker containers (which require working name resolution on the host system) do not always fail, so at least sometime / somewhen, named seems to successfully process the requests of the containers.

I hope, someone has an idea, where I should look at. It feels strange, that such a "reference" product as bind should be crashable simply by having a big number of fluctuating network devices.

Some side notes, maybe less related to the issue at hand, but I still want to write them here for the case, that they are relevant:

The system seems to be somewhat under load during the run of the containers, but I would be astonished, if this would cause bind to crash: RAM usage goes up to 16GB of 128GB possible, CPU goes up to 100%, though.

I have a second, similar machine (same distribution, similar setup regarding bind), but without the "pulsed" load of docker containers, where named is running since *looks*up*the*numbers* more than 8 days without crashes (which matches the uptime of that machine).

I wanted to open a bug at gitlab.isc.org, but my account ("deep42thought" under which I reported something a few years ago) got blocked after getting reactivated again, because I did not notice the big warning on the login page stating exactly this behaviour and took >1 day to gather the information for the bug. :-( Maybe someone can unblock me, then I could add 2FA to persist the account?

Some time ago I tried to get the stats channel working through

options {
    zone-statistics full;
}
statistics-channels {
    inet 127.0.0.1 port 8053;
};

but this seemed to crash the server back then. And since it was just a toy project, I didn't pursue it any further and have removed it from the config since quite some time.

regards,
Erich
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to