Hi,
I'm running bind version 9.20.6 on artix linux (an arch linux derivate
without systemd) with a pretty standard config:
# named -V
BIND 9.20.6 (Stable Release) <id:72cbad0>
running on Linux x86_64 6.13.5-artix1-1 #1 SMP PREEMPT_DYNAMIC Fri, 28 Feb 2025
10:18:15 +0000
built by make with '--prefix=/usr' '--sysconfdir=/etc' '--sbindir=/usr/bin'
'--localstatedir=/var' '--disable-static' '--enable-fixed-rrset'
'--enable-full-report' '--with-maxminddb' '--with-openssl' '--with-libidn2'
'--with-json-c' '--with-libxml2' '--with-lmdb' 'CFLAGS=-march=x86-64
-mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3
-Wformat -Werror=format-security -fstack-clash-protection
-fcf-protection -flto=auto -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common
-Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-z,pack-relative-relocs
-flto=auto'
compiled by GCC 14.2.1 20250207
compiled with OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
linked to OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
compiled with libuv version: 1.50.0
linked to libuv version: 1.50.0
compiled with liburcu version: 0.15.0
compiled with jemalloc version: 5.3.0
compiled with libnghttp2 version: 1.64.0
linked to libnghttp2 version: 1.65.0
compiled with libxml2 version: 2.13.5
linked to libxml2 version: 21306-GITv2.13.6
compiled with json-c version: 0.18
linked to json-c version: 0.18
compiled with zlib version: 1.3.1
linked to zlib version: 1.3.1
linked to maxminddb version: 1.12.2
threads support is enabled
DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512
ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
DS algorithms: SHA-1 SHA-256 SHA-384
HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384
HMAC-SHA512
TKEY mode 2 support (Diffie-Hellman): no
TKEY mode 3 support (GSS-API): yes
default paths:
named configuration: /etc/named.conf
rndc configuration: /etc/rndc.conf
nsupdate session key: /var/run/named/session.key
named PID file: /var/run/named/named.pid
geoip-directory: /usr/share/GeoIP
# grep '^\s*[^[:space:]#/]' /etc/named.conf
options {
directory "/var/named";
pid-file "/run/named/named.pid";
allow-recursion { 127.0.0.1; 192.168.188.0/24; };
allow-transfer { none; };
allow-update { none; };
version none;
hostname none;
server-id none;
};
zone "localhost" IN {
type master;
file "localhost.zone";
};
zone "0.0.127.in-addr.arpa" IN {
type master;
file "127.0.0.zone";
};
zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
{
type master;
file "localhost.ip6.zone";
};
# pgrep -af named
22958 /usr/bin/named -u named -L /var/log/named.log
Since a few days (or weeks?) now, it started to act up. Every few ten
minutes, it crashes with:
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result():
unexpected error:
10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in
start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
10-Mar-2025 20:33:36.996 network: error: creating IPv6 interface veth731351f
failed; interface ignored
10-Mar-2025 20:33:36.996 network: info: listening on IPv6 interface
vetha808625, fe80::d0cf:5fff:fe3a:1e50%954915#53
10-Mar-2025 20:33:36.998 network: info: listening on IPv6 interface
veth92035bc, fe80::58f0:c5ff:fecf:4a8d%954971#53
10-Mar-2025 20:33:37.000 network: info: listening on IPv6 interface
vethb1ef26b, fe80::58e2:d2ff:fe3f:c77f%955141#53
10-Mar-2025 20:33:37.003 network: info: listening on IPv6 interface
veth0ee3ea4, fe80::44be:c7ff:fefd:83fb%955153#53
10-Mar-2025 20:33:37.005 network: info: listening on IPv6 interface
veth39e879e, fe80::34fb:98ff:fe9e:d49f%955162#53
10-Mar-2025 20:33:37.007 network: info: listening on IPv6 interface
veth2f2d6df, fe80::2c2b:e8ff:fe8e:2339%955167#53
10-Mar-2025 20:33:37.010 network: info: listening on IPv6 interface
vetha0e2b2b, fe80::84fd:7aff:fe72:9c82%955207#53
10-Mar-2025 20:33:37.012 network: info: listening on IPv6 interface
vethb633142, fe80::58a5:32ff:feaf:bdb2%955208#53
10-Mar-2025 20:33:37.014 network: info: listening on IPv6 interface
veth232d291, fe80::f442:a2ff:fe0d:18f8%955383#53
10-Mar-2025 20:33:37.017 network: info: listening on IPv6 interface
vetha87c0e9, fe80::2431:26ff:fe1e:adac%955384#53
10-Mar-2025 20:33:37.021 network: info: listening on IPv6 interface
vethadab24f, fe80::7d:44ff:fe11:7284%955606#53
10-Mar-2025 20:33:37.024 network: info: listening on IPv6 interface
vethe9c8381, fe80::1847:42ff:fe98:cd5c%955655#53
10-Mar-2025 20:33:37.026 network: info: listening on IPv6 interface
veth5f5869a, fe80::ec06:66ff:fe5d:ef74%955668#53
10-Mar-2025 20:33:37.029 network: info: listening on IPv6 interface
vethe46d2e1, fe80::f48e:14ff:fe94:2efd%955683#53
10-Mar-2025 20:33:37.032 network: info: listening on IPv6 interface
vethf87bbe4, fe80::6c0b:47ff:fed2:404d%955686#53
10-Mar-2025 20:33:37.035 network: info: listening on IPv6 interface
veth207c7ca, fe80::f019:b8ff:feda:517d%955692#53
10-Mar-2025 20:33:37.038 network: info: listening on IPv6 interface
veth1654fa8, fe80::fc83:fcff:fe79:8f01%955718#53
10-Mar-2025 20:33:37.041 network: info: listening on IPv6 interface
vethe4e528f, fe80::901d:7fff:fe58:ed2%955719#53
10-Mar-2025 20:33:37.041 general: critical:
netmgr/udp.c:77:isc__nm_udp_lb_socket(): fatal error:
10-Mar-2025 20:33:37.041 general: critical: RUNTIME_CHECK(result ==
ISC_R_SUCCESS) failed
10-Mar-2025 20:33:37.041 general: critical: exiting (due to fatal error in
library)
As a first-aid, I added a script to simply restart the nameserver, if it
crashes. This showed me two things:
1. If the server crashed, a restart will fail for the next one or two
minutes, too.
2. The crashes seem to correlate with the other main load, that I have on
this machine: A couple hundred docker containers (each of which apparently
setting up a network device on the host system), that are started every
ten minutes and run for a few minutes (in rare cases longer). Looking at
the minutes of the assertion-logs, there is a clear emphasis on minutes
when many containers start(?)/run/stop:
$ grep -F 'RUNTIME_CHECK(result == ISC_R_SUCCESS)' /var/log/named.log | cut -d'
' -f2 | cut -d: -f2 | cut -c2 | sort | uniq -c
5976 0
14767 1
42850 2
31292 3
693 4
204 5
199 6
211 7
226 8
198 9
The containers are started via a cronjob:
*/10 * * * * /home/erich/git/archlinuxewe/build-all-with-docker
In between the crashes, the nameserver seems to run as-expected. Also, the
docker containers (which require working name resolution on the host
system) do not always fail, so at least sometime / somewhen, named seems
to successfully process the requests of the containers.
I hope, someone has an idea, where I should look at. It feels strange,
that such a "reference" product as bind should be crashable simply by
having a big number of fluctuating network devices.
Some side notes, maybe less related to the issue at hand, but I still want
to write them here for the case, that they are relevant:
The system seems to be somewhat under load during the run of the
containers, but I would be astonished, if this would cause bind to crash:
RAM usage goes up to 16GB of 128GB possible, CPU goes up to 100%, though.
I have a second, similar machine (same distribution, similar setup
regarding bind), but without the "pulsed" load of docker containers, where
named is running since *looks*up*the*numbers* more than 8 days without
crashes (which matches the uptime of that machine).
I wanted to open a bug at gitlab.isc.org, but my account ("deep42thought"
under which I reported something a few years ago) got blocked after
getting reactivated again, because I did not notice the big warning on the
login page stating exactly this behaviour and took >1 day to gather the
information for the bug. :-( Maybe someone can unblock me, then I could
add 2FA to persist the account?
Some time ago I tried to get the stats channel working through
options {
zone-statistics full;
}
statistics-channels {
inet 127.0.0.1 port 8053;
};
but this seemed to crash the server back then. And since it was just a toy
project, I didn't pursue it any further and have removed it from the
config since quite some time.
regards,
Erich
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from
this list
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users