RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users
Hi Ondřej!

I play around with eu-stack. When I call  eu-stack -p 1605200 -v (during normal 
operations) the stacktrace looks meaningless to me (See below). Do I need a 
certain parameter or do I have to install debug symbols?

Thanks
Klaus

# eu-stack -p 1605200 -v
PID 1605200 - process
TID 1605200:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x57c70ed697a6 - 1 main - /usr/sbin/named
#5  0x7b8ceb42a1ca - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb42a28b - 1 __libc_start_main - 
/usr/lib/x86_64-linux-gnu/libc.so.6
#7  0x57c70ed6a175 - 1 _start - /usr/sbin/named
TID 1605201:
#0  0x7b8ceb52725d syscall - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec0418ec - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#2  0x7b8cec041da5 - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#3  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#4  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605202:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605207:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605208:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605209:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605210:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605211:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605212:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_

Re: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Petr Špaček

On 06. 09. 24 9:04, Klaus Darilion via bind-users wrote:
I play around with eu-stack. When I call  eu-stack -p 1605200 -v (during 
normal operations) the stacktrace looks meaningless to me (See below). 
Do I need a certain parameter or do I have to install debug symbols?


Seems fine to me - you just hit the moment when all threads were not 
doing anything. That's why all off them (except one) were in a *wait* 
functions.


It should look more interesting if you put more load on the server.

--
Petr Špaček
Internet Systems Consortium
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Ondřej Surý
Yup, you need dbgsym packages?

https://ubuntu.com/server/docs/debug-symbol-packages

https://wiki.ubuntu.com/DebuggingProgramCrash#Installing_dbgsym_packages_from_a_PPA

--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 6. 9. 2024, at 9:05, Klaus Darilion  wrote:
> 
> I play around with eu-stack. When I call  eu-stack -p 1605200 -v (during 
> normal operations) the stacktrace looks meaningless to me (See below). Do I 
> need a certain parameter or do I have to install debug symbols?
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users
I just happened again. I have not yet installed the debug symbols.

I query the SOA every second with 1 second timeout. Here are the traces. I 
happened a few times in a row.

Below are the traces.

I noticed the timeout happened during Bind9 starting an inbound IXFR:
Sep 06 07:20:55 named[1605200]: zone xx/IN: Transfer started.
[ here inbetween were timeouts ]
Sep 06 07:25:56 named[1605200]: 0x7b8a8fdff000: transfer of xx/IN' from 
83.136.xx.xx#53: Transfer completed: 166 messages, 28386 records, 3319665 
bytes, 301.177 secs (11022 bytes/sec) (serial 2024090614)

Is bind applying IXFR during the inbound IXFR? Because 11kByte/sec is very slow 
(iperf shows 400Mbit network speed) and seems like Bind is also slow processing 
the inbound IXFR.

This zone has the following characteristic:
- 2.7GByte on disk
- ~ 6mio delegation
- NSEC3 without opt-out

Maybe NSEC3 calculations make bind9 busy? Up to now we noticed that problem 
with 2 zones, both have NSEC3 without opt-out.

Thanks
Klaus


FAILED - timeout (1 sec) or network error querying SOA at port 53024
PID 1605200 - process
TID 1605200:
#0  0x7b8ceb52bfa2 __sendmmsg - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec527e74 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec51522b - 1 uv_udp_send - 
/usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec65f6ba - 1 isc__nm_udp_send - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec5e90ad - 1 - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8cec5f0a87 - 1 ns_client_send - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#6  0x7b8cec5f1909 - 1 - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#7  0x7b8cec60ff60 - 1 ns_query_done - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#8  0x7b8cec6015c2 - 1 - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#9  0x7b8cec601f27 - 1 ns__query_start - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#10 0x7b8cec60273e - 1 - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#11 0x7b8cec5f222b - 1 - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#12 0x7b8cec5f2e1f - 1 ns_client_request - 
/usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#13 0x7b8cec64a478 - 1 isc__nm_readcb - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#14 0x7b8cec65f2b9 - 1 isc__nm_udp_read_cb - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#15 0x7b8cec52b802 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#16 0x7b8cec52b9b3 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#17 0x7b8cec52cbdb - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#18 0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#19 0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#20 0x57c70ed697a6 - 1 main - /usr/sbin/named
#21 0x7b8ceb42a1ca - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#22 0x7b8ceb42a28b - 1 __libc_start_main - 
/usr/lib/x86_64-linux-gnu/libc.so.6
#23 0x57c70ed6a175 - 1 _start - /usr/sbin/named
TID 1605201:
#0  0x7b8ceb52725d syscall - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec0418ec - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#2  0x7b8cec041da5 - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#3  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#4  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605202:
#0  0x7b8cec04029b cds_lfht_next - 
/usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#1  0x7b8cebeb3edf - 1 - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#2  0x7b8cebebb181 - 1 - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#3  0x7b8cebe45222 - 1 dns__db_closeversion - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cebf5e125 - 1 - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8cec683e56 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#6  0x7b8cec5172d9 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#7  0x7b8cec50e513 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#8  0x7b8cec52cbdb - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#9  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#10 0x7b8cec6708d1 - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#11 0x7b8cec68502a - 1 - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#12 0x7b8ceb49ca94

Re: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Ondřej Surý
Yes, just replace RPZ with “processing the incoming transfers”.Sounds like 12 should work in your case.We should have a fix ready in couple of weeks.Ondrej--Ondřej Surý — ISC (He/Him)My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours.On 6. 9. 2024, at 9:52, Klaus Darilion  wrote:







Thanks for the link. In our case we do not use RPZ and it is an authoritative server. Shall we still use the same workaround?
That bind9 process hosts 4 zones, and the server has 8 vCPUs.
So, shall I try with UV_THREADPOOL_SIZE=12 ?
 
Current behavior is:
# grep Threads /proc/1605200/status
Threads:    18
 
Thanks
Klaus
 
 

-- 
Klaus Darilion, Head of Operations
nic.at GmbH, Jakob-Haringer-Straße 8/V
5020 Salzburg, Austria

 



From: Ondřej Surý 

Sent: Friday, September 6, 2024 9:44 AM
To: Klaus Darilion 
Cc: Petr Špaček ; bind-users@lists.isc.org; Klaus Darilion via bind-users 
Subject: Re: Sporadic Timeouts after upgrading to bind9.20


 
Ah, you’ve confirmed my suspicions: https://gitlab.isc.org/isc-projects/bind9/-/issues/4898

 


See https://gitlab.isc.org/isc-projects/bind9/-/issues/4898#note_487237 for workaround.


 


Ondrej


--

Ondřej Surý — ISC (He/Him)

 


My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours.







On 6. 9. 2024, at 9:38, Klaus Darilion  wrote:




I just happened again. I have not yet installed the debug symbols.

I query the SOA every second with 1 second timeout. Here are the traces. I happened a few times in a row.

Below are the traces.

I noticed the timeout happened during Bind9 starting an inbound IXFR:
Sep 06 07:20:55 named[1605200]: zone xx/IN: Transfer started.
[ here inbetween were timeouts ]
Sep 06 07:25:56 named[1605200]: 0x7b8a8fdff000: transfer of xx/IN' from 83.136.xx.xx#53: Transfer completed: 166 messages, 28386 records, 3319665 bytes, 301.177 secs (11022 bytes/sec) (serial 2024090614)

Is bind applying IXFR during the inbound IXFR? Because 11kByte/sec is very slow (iperf shows 400Mbit network speed) and seems like Bind is also slow processing the inbound IXFR.

This zone has the following characteristic:
- 2.7GByte on disk
- ~ 6mio delegation
- NSEC3 without opt-out

Maybe NSEC3 calculations make bind9 busy? Up to now we noticed that problem with 2 zones, both have NSEC3 without opt-out.

Thanks
Klaus


FAILED - timeout (1 sec) or network error querying SOA at port 53024
PID 1605200 - process
TID 1605200:
#0  0x7b8ceb52bfa2 __sendmmsg - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec527e74 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec51522b - 1 uv_udp_send - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec65f6ba - 1 isc__nm_udp_send - /usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#4  0x7b8cec5e90ad - 1 - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#5  0x7b8cec5f0a87 - 1 ns_client_send - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#6  0x7b8cec5f1909 - 1 - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#7  0x7b8cec60ff60 - 1 ns_query_done - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#8  0x7b8cec6015c2 - 1 - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#9  0x7b8cec601f27 - 1 ns__query_start - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#10 0x7b8cec60273e - 1 - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#11 0x7b8cec5f222b - 1 - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#12 0x7b8cec5f2e1f - 1 ns_client_request - /usr/lib/x86_64-linux-gnu/libns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#13 0x7b8cec64a478 - 1 isc__nm_readcb - /usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#14 0x7b8cec65f2b9 - 1 isc__nm_udp_read_cb - /usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#15 0x7b8cec52b802 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#16 0x7b8cec52b9b3 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#17 0x7b8cec52cbdb - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#18 0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#19 0x7b8cec6708d1 - 1 - /usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
#20 0x57c70ed697a6 - 1 main - /usr/sbin/named
#21 0x7b8ceb42a1ca - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#22 0x7b8ceb42a28b - 1 __libc_start_main - /usr/lib/x86_64-linux-gnu/libc.so.6
#23 0x57c70ed6a175 - 1 _start - /usr/sbin/named
TID 1605201:
#0  0x7b8ceb52725d syscall - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec0418ec - 1 - /usr/lib/x86

RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users
As there just was another IXFR, for the records, here is another trace with 
debug symbols installed. Thanks
Klaus

PID 1605200 - process
TID 1605200:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 loop_thread - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/loop.c:288:6
#4  0x57c70ed697a6 - 1 main - /usr/sbin/named

/usr/src/bind9-1:9.20.1-1+ubuntu24.04.1+deb.sury.org+1/bin/named/main.c:1575:2
#5  0x7b8ceb42a1ca - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#6  0x7b8ceb42a28b - 1 __libc_start_main - 
/usr/lib/x86_64-linux-gnu/libc.so.6
#7  0x57c70ed6a175 - 1 _start - /usr/sbin/named
TID 1605201:
#0  0x7b8ceb52725d syscall - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec0418ec - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#2  0x7b8cec041da5 - 1 - /usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#3  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#4  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605202:
#0  0x7b8cec04029b cds_lfht_next - 
/usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0
#1  0x7b8cebeb3edf - 1 free_gluetable - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/dns/qpzone.c:392:2
#2  0x7b8cebebb181 - 1 closeversion - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/dns/qpzone.c:1450:3
#3  0x7b8cebe45222 - 1 dns__db_closeversion - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/dns/db.c:415:14
#4  0x7b8cebf5e125 - 1 ixfr_apply_done - 
/usr/lib/x86_64-linux-gnu/libdns-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/dns/xfrin.c:588:3
#5  0x7b8cec683e56 - 1 isc__after_work_cb - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/work.c:42:2
#6  0x7b8cec5172d9 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#7  0x7b8cec50e513 - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#8  0x7b8cec52cbdb - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#9  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#10 0x7b8cec6708d1 - 1 loop_thread - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/loop.c:288:6
#11 0x7b8cec68502a - 1 thread_body - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:85:8
#12 0x7b8cec68502a - 1 thread_run - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:100:14
#13 0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#14 0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605207:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 loop_thread - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/loop.c:288:6
#4  0x7b8cec68502a - 1 thread_body - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:85:8
#5  0x7b8cec68502a - 1 thread_run - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:100:14
#6  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
#7  0x7b8ceb529c3c - 1 - /usr/lib/x86_64-linux-gnu/libc.so.6
TID 1605208:
#0  0x7b8ceb529ee0 epoll_pwait - /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x7b8cec52c9fa - 1 - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#2  0x7b8cec513ce8 - 1 uv_run - /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
#3  0x7b8cec6708d1 - 1 loop_thread - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/loop.c:288:6
#4  0x7b8cec68502a - 1 thread_body - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:85:8
#5  0x7b8cec68502a - 1 thread_run - 
/usr/lib/x86_64-linux-gnu/libisc-9.20.1-1+ubuntu24.04.1+deb.sury.org+1-Ubuntu.so
/build/bind9-eFXXmL/bind9-9.20.1/lib/isc/thread.c:100:14
#6  0x7b8ceb49ca94 - 1 - /usr/lib/x86_64

RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users
From: Ondřej Surý 
Sent: Friday, September 6, 2024 4:10 PM
To: Klaus Darilion 
Cc: Klaus Darilion via bind-users 
Subject: Re: Sporadic Timeouts after upgrading to bind9.20

Hmm, what is the churn in the zones? How often there’s IXFR and how large those 
changes are?


Every 30 minutes. See logs:

12:20:55 zone xx/IN: notify from 83.136.34.20#55138: serial 2024090624
12:20:55 zone xx/IN: notify from 2a02:850:9::5#42346: serial 2024090624: 
refresh in progress, refresh check queued
12:20:55 zone xx/IN: Transfer started.
12:20:55 0x7b38529d: transfer of 'xx/IN' from 83.136.34.20#53: connected 
using 83.136.34.20#53 TSIG rcode0-distribution
12:20:55 zone xx/IN: notify from 2a00:dd80:9:69::4#34743: serial 2024090624: 
refresh in progress, refresh check queued
12:20:55 zone xx/IN: notify from 185.40.233.229#44310: serial 2024090624: 
refresh in progress, refresh check queued
12:20:56 zone xx/IN: transferred serial 2024090624: TSIG 'rcode0-distribution'
12:20:56 0x7b38529d: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
status: success
12:20:56 0x7b38529d: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
completed: 132 messages, 23102 records, 2632938 bytes, 1.121 secs (2348740 
bytes/sec) (serial 2024090624)
12:20:56 zone xx/IN: notify from 83.136.34.4#48518: zone is up to date
12:20:56 zone xx/IN: notify from 2a02:850:8::5#41419: zone is up to date


12:50:57 zone xx/IN: notify from 83.136.34.20#47046: serial 2024090625
12:50:57 zone xx/IN: notify from 2a02:850:9::5#52732: serial 2024090625: 
refresh in progress, refresh check queued
12:50:57 zone xx/IN: Transfer started.
12:50:57 0x7b35fcfb: transfer of 'xx/IN' from 83.136.34.20#53: connected 
using 83.136.34.20#53 TSIG rcode0-distribution
12:50:58 zone xx/IN: notify from 2a00:dd80:9:69::4#56984: serial 2024090625: 
refresh in progress, refresh check queued
12:50:58 zone xx/IN: notify from 185.40.233.229#41328: serial 2024090625: 
refresh in progress, refresh check queued
12:50:59 zone xx/IN: notify from 83.136.34.4#55745: zone is up to date
12:50:59 zone xx/IN: notify from 2a02:850:8::5#58001: zone is up to date
12:51:11 zone xx/IN: transferred serial 2024090625: TSIG 'rcode0-distribution'
12:51:11 0x7b35fcfb: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
status: success
12:51:11 0x7b35fcfb: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
completed: 130 messages, 22833 records, 2586324 bytes, 13.500 secs (191579 
bytes/sec) (serial 2024090625)


13:21:56 zone xx/IN: notify from 2a00:dd80:9:69::4#49911: serial 2024090626
13:21:56 zone xx/IN: notify from 185.40.233.229#34402: serial 2024090626: 
refresh in progress, refresh check queued
13:21:56 zone xx/IN: Transfer started.
13:21:56 0x7b35fda6c000: transfer of 'xx/IN' from 185.40.233.229#53: connected 
using 185.40.233.229#53 TSIG rcode0-distribution
13:21:57 zone xx/IN: notify from 2a02:850:9::5#36734: serial 2024090626: 
refresh in progress, refresh check queued
13:21:57 zone xx/IN: notify from 83.136.34.20#48874: serial 2024090626: refresh 
in progress, refresh check queued
13:21:58 zone xx/IN: notify from 83.136.34.4#53682: zone is up to date
13:22:08 zone xx/IN: transferred serial 2024090626: TSIG 'rcode0-distribution'
13:22:08 0x7b35fda6c000: transfer of 'xx/IN' from 185.40.233.229#53: Transfer 
status: success
13:22:08 0x7b35fda6c000: transfer of 'xx/IN' from 185.40.233.229#53: Transfer 
completed: 132 messages, 23222 records, 2631441 bytes, 11.390 secs (231030 
bytes/sec) (serial 2024090626)
13:22:08 zone xx/IN: notify from 2a02:850:8::5#51426: zone is up to date
13:22:08 zone xx/IN: notify from 2a02:850:8::5#51426: zone is up to date


13:50:54 zone xx/IN: notify from 83.136.34.20#36630: serial 2024090627
13:50:54 zone xx/IN: notify from 2a02:850:9::5#57691: serial 2024090627: 
refresh in progress, refresh check queued
13:50:54 zone xx/IN: Transfer started.
13:50:54 0x7b35fd943000: transfer of 'xx/IN' from 83.136.34.20#53: connected 
using 83.136.34.20#53 TSIG rcode0-distribution
13:50:54 zone xx/IN: notify from 2a00:dd80:9:69::4#53297: serial 2024090627: 
refresh in progress, refresh check queued
13:50:54 zone xx/IN: notify from 185.40.233.229#35120: serial 2024090627: 
refresh in progress, refresh check queued
13:50:56 zone xx/IN: notify from 83.136.34.4#52671: zone is up to date
13:50:56 zone xx/IN: notify from 2a02:850:8::5#32936: zone is up to date
13:51:16 zone xx/IN: transferred serial 2024090627: TSIG 'rcode0-distribution'
13:51:16 0x7b35fd943000: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
status: success
13:51:16 0x7b35fd943000: transfer of 'xx/IN' from 83.136.34.20#53: Transfer 
completed: 121 messages, 21340 records, 2416628 bytes, 21.794 secs (110885 
bytes/sec) (serial 2024090627)



-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing lis

Re: bind918 malfunction?

2024-09-06 Thread Peter
This one was accidentially not sent to the list, sorry!

On Thu, Sep 05, 2024 at 08:04:37PM +0200, Ondřej Surý wrote:
! I’m on my phone, so this is a long shot, but you can try disabling the qname 
minimization.

Thank You for the suggestion, I can try this occasionally. Rather
I'd prefer to figure out what exactly is going wrong. Looking
through the entire dialogue it appears as if the resolver just stopped in
midflight and gave back SERVFAIL, with no obvious reason.

After figuring out why assumedly my database started to quote
querystrings ;) I looked into the data of the day, I found something
interesting: there is again the resolver apparently giving up in
midflight, but this time there are some error messages:

Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns1.edns.t-ipnet.de/A'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns1.edns.t-ipnet.de/'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns2.edns.t-ipnet.de/A'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns2.edns.t-ipnet.de/'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns3.edns.t-ipnet.de/A'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns3.edns.t-ipnet.de/'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns4.edns.t-ipnet.de/A'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns4.edns.t-ipnet.de/'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns5.edns.t-ipnet.de/A'
Sep  5 17:18:33  conr named[4456]: resolver: info: loop detected 
resolving 'ns5.edns.t-ipnet.de/'
Sep  5 17:18:34  conr named[4456]: query-errors: info:
  client @0x885bd4160 192.168.97.23#3102 (_sip._udp.tel.t-online.de):
  view intra: query failed (failure) for _sip._udp.tel.t-on line.de/IN/SRV at 
query.c:7836
Sep  5 17:18:34  conr named[4456]: query-errors: info:
  client @0x87d9fc160 192.168.97.23#3102 (_sip._tcp.tel.t-online.de):
  view intra: query failed (failure) for _sip._tcp.tel.t-on line.de/IN/SRV at 
query.c:7836
Sep  5 17:18:34  conr named[4456]: query-errors: info:
  client @0x840265160 192.168.97.23#3102 (_sips._tcp.tel.t-online.de):
  view intra: query failed (failure) for _sips._tcp.tel.t-online.de/IN/SRV at 
query.c:7836

Looking further, there are more of these "loop detected", apparently for
random nameservers all around world. And also from 9.16, but they
didn't seem to harm back then.
Not sure what to make of this - it doesn't look good.


cheerio,
PMc
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Bob Harold
Recently (2024/9/21) I ran into an issue that might be similar.  Due to
DDoS attacks that use complicated lookups to make DNS servers do extra
work, to slow them down, some recent DNS server software has tightened the
amount of 'work' that it will do on a single query before giving up and
returning SERVFAIL.  In my case I had spread out my NS records over several
domains, and each of those domains depended on yet more domains.  This was
designed to increase resilience by not depending on a single domain.  But
we began to get random failures, in our case when trying to get an SSL
Certificate, LetEncrypt using Unbound was verifying every NS record and
sometimes gave up, with an error message "exceeded the maximum nameserver
nxdomains" even though there were no 'nxdomains' in the log.  I simplified
my NS records and the problem went away.

-- 
Bob Harold


On Fri, Sep 6, 2024 at 11:36 AM Peter  wrote:

> On Fri, Sep 06, 2024 at 08:18:52AM +1000, Mark Andrews wrote:
> ! Well from here all the IPv4 addresses for the tel.t-online.de
> ! servers are not responding.
>
> Wait - which IPv4 addresses? AFAIK that thing doesn't have any
> addresses, it is only used for NAPTR queries.
>
> ! That won’t be helping things.   Also the servers are generating invalid
> negative responses.
> ! The SOA record in the response is the QNAME rather than the owner of
> ! the zone.
>
> Wow. Interesting.
>
> ! Also waiting
> ! an hour to retry on SERVFAIL is ridiculous.
>
> Yes, agreed. But this device is a piece of physical hardware and
> commercially available from Alcatel; so this is what one encounters in
> the field. (That's why I usually prefer to design or at least compile
> stuff myself, so I can fix things.)
>
> ! What you haven’t shown is the communication between the recursive server
> and the authoritative
> ! servers.
> !
> ! tcpdump -w trace.pcap port 53 and \( host ns1.edns.t-ipnet.de or
> ! ns2.edns.t-ipnet.de or ns3.edns.t-ipnet.de or ns4.edns.t-ipnet.de or
> ! ns5.edns.t-ipnet.de \)
>
> There is none.
> SERVFAIL is sent before we even get that far:
>
>  intra | 31.08.2024 06:12:10.646279 CEST | 31.08.2024 06:12:10.673001
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns1.edns.t-ipnet.de. IN 
>  intra | 31.08.2024 06:12:10.646279 CEST | 31.08.2024 06:12:10.673001
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | ANSWER |
> NOERROR  | qr aa cd | ns1.edns.t-ipnet.de. 86400 IN  2003:180:8::53
>  intra | 31.08.2024 06:12:10.64797 CEST  | 31.08.2024 06:12:10.674063
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns2.edns.t-ipnet.de. IN 
>  intra | 31.08.2024 06:12:10.64797 CEST  | 31.08.2024 06:12:10.674063
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | ANSWER |
> NOERROR  | qr aa cd | ns2.edns.t-ipnet.de. 86400 IN 
> 2003:180:8:100::53
>  intra | 31.08.2024 06:12:10.644626 CEST | 31.08.2024 06:12:10.674381
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns2.edns.t-ipnet.de. IN A
>  intra | 31.08.2024 06:12:10.644626 CEST | 31.08.2024 06:12:10.674381
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | ANSWER |
> NOERROR  | qr aa cd | ns2.edns.t-ipnet.de. 86400 IN A 212.185.255.217
>  intra | 31.08.2024 06:12:10.642914 CEST | 31.08.2024 06:12:10.674887
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns5.edns.t-ipnet.de. IN 
>  intra | 31.08.2024 06:12:10.642914 CEST | 31.08.2024 06:12:10.674887
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | ANSWER |
> NOERROR  | qr aa cd | ns5.edns.t-ipnet.de. 86400 IN 
> 2003:180:8:400::53
>  intra | 31.08.2024 06:12:10.651237 CEST | 31.08.2024 06:12:10.675469
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns4.edns.t-ipnet.de. IN A
>  intra | 31.08.2024 06:12:10.651237 CEST | 31.08.2024 06:12:10.675469
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | ANSWER |
> NOERROR  | qr aa cd | ns4.edns.t-ipnet.de. 86400 IN A 212.185.255.233
>  intra | 31.08.2024 06:12:10.52171 CEST  | 31.08.2024 06:12:10.681361
> CEST | | | CLIENT_RESPONSE   | QUESTION   |
> SERVFAIL | qr rd ra | _sip._udp.tel.t-online.de. IN SRV
> *intra | 31.08.2024 06:12:10.681672 CEST | 31.08.2024 06:12:10.699011
> CEST | ::  | 2003:180:8:100::53  | RESOLVER_QUERY| QUESTION   |
>   |  | ns6.edns.t-ipnet.de. IN A
> *intra | 31.08.2024 06:12:10.684058 CEST | 31.08.2024 06:12:10.698688
> CEST | ::  | 2003:180:8:100::53  | RESOLVER_QUERY| QUESTION   |
>   |  | ns6.edns.t-ipnet.de. IN 
>  intra | 31.08.2024 06:12:10.649577 CEST | 31.08.2024 06:12:10.684556
> CEST | ::  | 2003:180:4:a::1:53  | RESOLVER_RESPONSE | QUESTION   |
> NOERROR  | qr aa cd | ns1.edn

Re: bind918 malfunction?

2024-09-06 Thread Peter
On Fri, Sep 06, 2024 at 12:55:20PM -0400, Bob Harold wrote:
! Recently (2024/9/21) I ran into an issue that might be similar.  Due to
! DDoS attacks that use complicated lookups to make DNS servers do extra
! work, to slow them down, some recent DNS server software has tightened the
! amount of 'work' that it will do on a single query before giving up and
! returning SERVFAIL.  In my case I had spread out my NS records over several
! domains, and each of those domains depended on yet more domains.  This was
! designed to increase resilience by not depending on a single domain.  But
! we began to get random failures, in our case when trying to get an SSL
! Certificate, LetEncrypt using Unbound was verifying every NS record and
! sometimes gave up, with an error message "exceeded the maximum nameserver
! nxdomains" even though there were no 'nxdomains' in the log.  I simplified
! my NS records and the problem went away.

Thank You,

  I am on this track now, also. I found that in two cases there were
precisely 31 resolver queries before the SERVFAIL, and I wondered why
this would be the same number. Then I found in the release notes
something about limiting query count to 32.

  If this is indeed the issue, then we need an error message that
actually tells us what the problem is.

  I am currently analyzing issues that appeared /after/ upgrade to 9.18
and /before/ 9.18.29 - and these are a lot rarer, and most look like
genuine weirdness/outages/maintenance.

cheerio,
PMc
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Ondřej Surý
Try using running `named -d 9 (plus other existing args)` to see why there are 
31+ queries. There must be something wonky going on.

--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 6. 9. 2024, at 20:00, Peter  wrote:
> 
> precisely 31 resolver queries before the SERVFAIL

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Peter
On Fri, Sep 06, 2024 at 08:05:18PM +0200, Ondřej Surý wrote:
! Try using running `named -d 9 (plus other existing args)` to see why there 
are 31+ queries. There must be something wonky going on.
! 

Alright. "-d 9" does nothing.

Changing the named.conf does something:
channel named_log {
syslog local0;
// severity info;
severity debug 99;
print-category yes;
print-severity yes;
};
channel activity_log {
syslog local1;
// severity info;
severity debug 99;
print-category yes;
print-severity yes;
};
category client  { activity_log; };
category config  { named_log; };
category database{ named_log; };
category dnssec  { activity_log; };
category general { named_log; };
category network { named_log; };
category notify  { named_log; };
category queries { null; };
category query-errors{ activity_log; };
category resolver{ activity_log; };
category security{ named_log; };
category update  { named_log; };
category update-security { named_log; };
category xfer-in { named_log; };
category xfer-out{ named_log; };
category unmatched   { named_log; };
category default { named_log; };

Not sure if this is all we need. "queries" got me bored, as I
have dnstap anyway. If we need it, I must try to repeat - it
does not always happen. Here it did:

20:31:59# host ns4.edns.t-ipnet.de
Host ns4.edns.t-ipnet.de not found: 2(SERVFAIL)

Apparently the 99 is now a bit more than you wanted.
But - voila!

Sep  6 20:31:59  pole named[71152]: resolver: debug 3: exceeded 
max queries resolving 'ns1.edns.t-ipnet.de/' (querycount=33, maxqueries=32)

cheerio,
PMc
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Ondřej Surý
Now the question remains - why? I don’t really see a reason for this behavior 
from where I tested it, so what is the traffic between your recursor and the 
Internet during the time this happens?

Ondřej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 6. 9. 2024, at 20:54, Peter  wrote:
> 
> Sep  6 20:31:59  pole named[71152]: resolver: debug 3: exceeded 
> max queries resolving 'ns1.edns.t-ipnet.de/' (querycount=33, 
> maxqueries=32)

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Ondřej Surý
Ok, so according to zonemaster: 
https://zonemaster.net/en/result/7fc39ff8fc1766ac all the nameservers are in 
the same zone. I am guessing that any intermittent failure can cause a lot of 
outgoing queries.

Anyway - since you are hitting the 32 limit, perhaps bumping the limit to 100 
(the value before) would help in your case? I am guessing the resolver is being 
used for a limited set of clients and the chance of this specific abuse is 
quite low.

https://bind9.readthedocs.io/en/v9.18.29/notes.html#notes-for-bind-9-18-29

Ondrej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 6. 9. 2024, at 21:13, Ondřej Surý  wrote:
> Now the question remains - why? I don’t really see a reason for this 
> behavior from where I tested it, so what is the traffic between your recursor 
> and the Internet during the time this happens?
> 
> Ondřej
> --
> Ondřej Surý — ISC (He/Him)
> 
> My working hours and your working hours may be different. Please do not feel 
> obligated to reply outside your normal working hours.
> 
>> On 6. 9. 2024, at 20:54, Peter  wrote:
>> 
>> Sep  6 20:31:59  pole named[71152]: resolver: debug 3: 
>> exceeded max queries resolving 'ns1.edns.t-ipnet.de/' (querycount=33, 
>> maxqueries=32)
> 
> --
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
> this list
> 
> ISC funds the development of this software with paid support subscriptions. 
> Contact us at https://www.isc.org/contact/ for more information.
> 
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Bob Harold
The original zone has NS records in two domains:
t-ipnet.de. 82632 IN NS dns20.dns.t-ipnet.de.
t-ipnet.de. 82632 IN NS dns02.dns.t-ipnet.de.
t-ipnet.de. 82632 IN NS dns00.dns.t-ipnet.de.
t-ipnet.de. 82632 IN NS pns.dtag.de.
t-ipnet.de. 82632 IN NS dns50.dns.t-ipnet.de.

And dtag.de has:
dtag.de. 61568 IN NS pns.dtag.de.
dtag.de. 61568 IN NS ns1.telekom.net.

And telekom.net. has:
telekom.net. 3600 IN NS dns2.telekom.de.
telekom.net. 3600 IN NS pns.dtag.de.
telekom.net. 3600 IN NS dns1.telekom.de.
telekom.net. 3600 IN NS ns1.telekom.net.

And telekom.de. has:
telekom.de. 3600 IN NS ns1.telekom.net.
telekom.de. 3600 IN NS dns1.telekom.de.
telekom.de. 3600 IN NS dns2.telekom.de.
telekom.de. 3600 IN NS pns.dtag.de.

This is the type of NS record 'tree' that I also had, that caused me
problems.

-- 
Bob Harold


On Fri, Sep 6, 2024 at 3:27 PM Ondřej Surý  wrote:

> Ok, so according to zonemaster:
> https://zonemaster.net/en/result/7fc39ff8fc1766ac all the nameservers are
> in the same zone. I am guessing that any intermittent failure can cause a
> lot of outgoing queries.
>
> Anyway - since you are hitting the 32 limit, perhaps bumping the limit to
> 100 (the value before) would help in your case? I am guessing the resolver
> is being used for a limited set of clients and the chance of this specific
> abuse is quite low.
>
> https://bind9.readthedocs.io/en/v9.18.29/notes.html#notes-for-bind-9-18-29
>
> Ondrej
> --
> Ondřej Surý — ISC (He/Him)
>
> My working hours and your working hours may be different. Please do not
> feel obligated to reply outside your normal working hours.
>
> On 6. 9. 2024, at 21:13, Ondřej Surý  wrote:
>
> Now the question remains - why? I don’t really see a reason for this
> behavior from where I tested it, so what is the traffic between your
> recursor and the Internet during the time this happens?
>
> Ondřej
> --
> Ondřej Surý — ISC (He/Him)
>
> My working hours and your working hours may be different. Please do not
> feel obligated to reply outside your normal working hours.
>
> On 6. 9. 2024, at 20:54, Peter  wrote:
>
>
> Sep  6 20:31:59  pole named[71152]: resolver: debug 3:
> exceeded max queries resolving 'ns1.edns.t-ipnet.de/' (querycount=33,
> maxqueries=32)
>
>
> --
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
> from this list
>
> ISC funds the development of this software with paid support
> subscriptions. Contact us at https://www.isc.org/contact/ for more
> information.
>
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
> --
> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
> from this list
>
> ISC funds the development of this software with paid support
> subscriptions. Contact us at https://www.isc.org/contact/ for more
> information.
>
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: bind918 malfunction?

2024-09-06 Thread Peter
On Fri, Sep 06, 2024 at 09:12:51PM +0200, Ondřej Surý wrote:
! Now the question remains - why? I don’t really see a reason for this
! behavior from where I tested it, so what is the traffic between your
! recursor and the Internet during the time this happens?

Well, I can see why - but I don't know if this is all as intended:

* Query comes along "tel.t-online.de. IN NAPTR"
* RESOLVER_QUERY goes to 2003:180:4:10a::2:53 (I don't know where that
  comes from. Cache?
* Response gives *six* authoritative: ns1...ns6.edns.t-ipnet.de.
* 12 queries (v4/v6) go to the root servers
* we get 12 identical answers, containing *six* servers for "de."
* 12 queries go to z.nic.de.
* we get 12 identical answers, containing five servers for "t-ipnet.de."
* 6 queries go to dns20.dns.t-ipnet.de.
* we get 6 answers and cannot use them, because 1+12+12+6 = 31
* SERVFAIL

You have the details in the attachment on one of my mails.

It looks logically correct to me. If this is not how it is supposed
to work, then tell me. I didn't build it ;) - I just tried to
understand it and then make it work nicely.
And I did not hack that part! - only the dnstap, to get to the
information that is needed for this kind of analysis (microseconds and
view). 

cheerio,
PMc
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users

From: Ondřej Surý 
Sent: Friday, September 6, 2024 4:08 PM
To: Klaus Darilion 
Cc: Petr Špaček ; bind-users@lists.isc.org; Klaus Darilion via 
bind-users 
Subject: Re: Sporadic Timeouts after upgrading to bind9.20

Are your running with options { reuseport no; };  ?

You might want to try that.

After setting reuseport no; (and UV_THREADPOOL_SIZE=12) I have not seen any 
timeouts anymore.

Anyway, this:

TID 8917:
#0  0x7b385aa6daa9 cds_lfht_destroy - 
/usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0

caught my eye. Are the zones you are hosting particularly large on GLUE?

I don’T know and I have not checked yet. One of the affected zones is .ch.  You 
could download the zone from https://zonedata.switch.ch/ And they are using 
NSEC (not NSEC3 as I have written before)



Also if you have more eu-stack, can you confirm this is the pattern now?

After setting reuseport no; I do not have stack-traces any more. But if that 
would help you I can undo the workaround next week to collect traces.

Thanks
Klaus


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: Sporadic Timeouts after upgrading to bind9.20

2024-09-06 Thread Klaus Darilion via bind-users
Correcting myself: event with { reuseport no; };  and UV_THREADPOOL_SIZE=12 
still timeouts happen, but the situation improved a lot.
Regards
Klaus

From: bind-users  On Behalf Of Klaus Darilion 
via bind-users
Sent: Saturday, September 7, 2024 12:21 AM
To: Ondřej Surý 
Cc: Klaus Darilion via bind-users 
Subject: RE: Sporadic Timeouts after upgrading to bind9.20


From: Ondřej Surý mailto:ond...@isc.org>>
Sent: Friday, September 6, 2024 4:08 PM
To: Klaus Darilion mailto:klaus.daril...@nic.at>>
Cc: Petr Špaček mailto:pspa...@isc.org>>; 
bind-users@lists.isc.org; Klaus Darilion via 
bind-users mailto:bind-users@lists.isc.org>>
Subject: Re: Sporadic Timeouts after upgrading to bind9.20

Are your running with options { reuseport no; };  ?

You might want to try that.

After setting reuseport no; (and UV_THREADPOOL_SIZE=12) I have not seen any 
timeouts anymore.

Anyway, this:

TID 8917:
#0  0x7b385aa6daa9 cds_lfht_destroy - 
/usr/lib/x86_64-linux-gnu/liburcu-cds.so.8.1.0

caught my eye. Are the zones you are hosting particularly large on GLUE?

I don’T know and I have not checked yet. One of the affected zones is .ch.  You 
could download the zone from https://zonedata.switch.ch/ And they are using 
NSEC (not NSEC3 as I have written before)



Also if you have more eu-stack, can you confirm this is the pattern now?

After setting reuseport no; I do not have stack-traces any more. But if that 
would help you I can undo the workaround next week to collect traces.

Thanks
Klaus


-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users