named UDP retransmit timeouts ?

2021-07-23 Thread Jason Vas Dias


Good day bind experts -

 Please can anyone advise the best way to optimize named's
 UDP timeout settings for caching-only local resolver usage
 over a slow network link - I can't seem to find any in the
 Bv9ARM document specifically describing how named
 implements UDP re-transmits - please could someone
 point me at the right pages or place to look, besides
 the source code, which I am reading now, if there are any ?

 My problem is that at home my whole internet goes through
 one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) ,
 it seems no more than about 128 kilobyte/sec download & less upload
 bandwidth is available, whenever my browser decides to download
 something large (like a JavaScript blob) , then DNS requests
 start timing out, the browser keeps re-issuing its requests,
 and similar nasty feedback situations occur when the GSM
 modem's DHCP lease expires and it has to re-setup its NAT for
 the ethernet link, so all UDP requests time out for about
 10 seconds, building up quite a backlog.

 I have tried playing around with named.conf settings:
   resolver-retry-interval 8;
   resolver-retry-time32;
   max-retry-time 32;
 but they don't seem to help - I still get a 'DNS freeze'
 situation for about 10-30 seconds when the GSM modem
 renegotiates its DHCP lease, during a yum / dnf 'update',
 during large browser downloads or stream playing ...

 My Linux v5.12.17 (Fedora-34) x86_64 box runs named 9.6.18
 from the Fedora RPM, and hosts a Windows 10 VM, which is quite a
 chatty DNS user, and runs a hostapd instance through which traffic from a
 local network of 3 Android mobile phones use as their default
 data connection, which also use the laptop's DNS server,
 and send SIP voice traffic through my company's SIP server which I
 maintain , so the Linux box does NAT for the Windows VM and for the
 Android mobile clients, the laptop named instance serves authorative
 zones for my localhost, local VMs and DMZ Android Mobile phone units, 
 and ALL hosts, including the windows host, use BIND named running
 on the Linux laptop gateway, which is the default route endpoint
 for all hosts, and which has a 'forwarders { ... };'  clause in
 named.conf containing my Cellular Network provider's DNS server IP
 addresses . These remote Cellular DNS servers can respond very slowly at peak
 internet usage times. It is nice to be able to see all packets from the
 android mobile phones with tcpdump, and to be able to receive
 the voice traffic that they send to our cloud SIP server
 (which I can see being NAT-ed), and the SIP server sends
 back, which get NAT-ed to the Windows VM Dispatcher and
 audio playback GUI running on the laptop which I also maintain .
 My BIND named server also implements an RBL blacklist kindly made available
 as a hosts file, which I convert to a Response Policy Zone file,
 at https://someonewhocares.org/hosts . DNSSEC is also enabled
 by default.
 
 My named.conf has a clause:

 allow-query { localhost; 192.168.W.0/24; 192.168.M.0/24;
   192.168.V.0/24;
 };
 where W is Windows VM network, M is mobile device network,
 and V is my corporate L2TP/IPSEC VPN network, also doing NAT,
 and one 'localhost-resolver' "View" with
 match-clients { /* same as above */ } ;
 and
 recursion yes;

 This setup works great on a normal office LAN , where there
 are multiple hops to the internet available, but not on my
 home slow single ethernet connection to the whole ethernet,
 through a modem that must peridically renegotiate a DHCP lease.
 When the modem renegotiates its DHCP lease every hour, I typically
 have to restart named and hostapd . 

 I just want named to notice that the response times to
 the forwarders are increasing , and to increase its
 number of UDP re-transmit attempts and timeout time (time
 between attempts ) accordingly, and vice versa
 (decrease them back to defaults when forwarder responsiveness
 improves).
 
 Before I start hacking the named udp.c server code , please
 could anyone advise if there are ways through configuration
 settings to adjust the named UDP re-transmit timeout & number
 of attempts strategy for slow networks ?

 I can't believe there aren't any ?

Thanks in advance for any informative replies,
Best Regards,
Jason Vas Dias



 

 
 
 
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Peter via bind-users
So after ALL that it was down to the number of cores/threads, anything 
more then 7 cores/threads and 9.16.19 WILL NOT RUN tested in avirtual PC.


Man what A BUG

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Richard T.A. Neal
Hi Peter,

I’ve run a few tests based on your observations regarding the number of vCPU 
cores and my own findings are that it is specifically 8 vCPUs and 12 vCPUs 
which exhibit this behaviour. I haven’t been able to test beyond 12 vCPUs 
because that’s my hardware limit.

With 1-7 vCPUs, or with 9-11 vCPUs BIND 9.16.19 starts just fine. But with 
either 8 vCPUs or 12 vCPUs the ISC BIND service fails to start.

If I try and start the service manually I get:

/ / / / /
Windows could not start the ISC BIND service on Local Computer.
Error 1067: The process terminated unexpectedly.
/ / / / /

And if I look in the Windows Application Event Log I see named logging all of 
the following, and then it stops right after logging that it’s using 1 UDP 
listener per interface:

/ / / / /
BIND 9 is maintained by Internet Systems Consortium,
Inc. (ISC), a non-profit 501(c)(3) public-benefit
corporation.  Support and training for BIND 9 are
available at https://www.isc.org/support

found 8 CPUs, using 8 worker threads
using 1 UDP listener per interface
/ / / / /

You should file a bug report on ISC’s BIND bug tracker if you’d like them to 
investigate it. It doesn’t impact me I’m afraid because my Windows BIND servers 
are all 2 vCPU virtual machines – I find that’s plenty enough resource to run 
BIND for me.

https://gitlab.isc.org/isc-projects/bind9/-/issues

Best,

Richard.

From: bind-users  On Behalf Of Peter via 
bind-users
Sent: 23 July 2021 5:44 pm
To: bind-users@lists.isc.org
Subject: ITS THE NUMBER OF CORES/THREADS


So after ALL that it was down to the number of cores/threads, anything more 
then 7 cores/threads and 9.16.19 WILL NOT RUN tested in a virtual PC.

Man what A BUG
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Peter via bind-users
Well I reported it and we see what happens my main bind is not in a 
virtual machine I guess I cound disbale Hyper-Threading as a workaround...
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Ondřej Surý
Thanks, having such a simple reproducer is helpful.

Can you try if adding `-n 8` vs `-n 7` have the same effect?

Ondřej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

> On 23. 7. 2021, at 20:31, Peter via bind-users  
> wrote:
> 
>  Well I reported it and we see what happens my main bind is not in a virtual 
> machine I guess I cound disbale Hyper-Threading as a workaround... 
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> 
> ISC funds the development of this software with paid support subscriptions. 
> Contact us at https://www.isc.org/contact/ for more information.
> 
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: named UDP retransmit timeouts ?

2021-07-23 Thread Tony Finch
Jason Vas Dias  wrote:
>
>  Please can anyone advise the best way to optimize named's
>  UDP timeout settings for caching-only local resolver usage
>  over a slow network link - I can't seem to find any in the
>  Bv9ARM document specifically describing how named
>  implements UDP re-transmits - please could someone
>  point me at the right pages or place to look, besides
>  the source code, which I am reading now, if there are any ?

I remember being surprised a while back that the retry intervals
and timeouts were more hard-coded than I expected. (But, be warned! I have
not refreshed my memory.)

The rough idea is that there's a certain amount of co-design between the
libc stub resolver (which back in the day came from BIND) and the
recursive server. IIRC, the libc resolver has a query timeout of 10s and
retries three times (so the overall timeout is about half a minute), and
named's resolver has a timeout of about 3s and also retries 3 times, which
neatly fits inside libc's 10s timeout.

At least that's what my memory tells me, but it may be wrong.

But, I think you will not be successful fixing your problems by tweaking
DNS software. One of the problems with DNS as a protocol is that its
transport layer is very simple and very stupid, so if the underlying
network has problems, the DNS isn't able to fight its way through.

>  My problem is that at home my whole internet goes through
>  one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) ,
>  it seems no more than about 128 kilobyte/sec download & less upload
>  bandwidth is available, whenever my browser decides to download
>  something large (like a JavaScript blob) , then DNS requests
>  start timing out, the browser keeps re-issuing its requests,
>  and similar nasty feedback situations occur when the GSM
>  modem's DHCP lease expires and it has to re-setup its NAT for
>  the ethernet link, so all UDP requests time out for about
>  10 seconds, building up quite a backlog.

Ugh, that sounds horrible.

I think the basic problem is that TCP is very aggressive about filling up
whatever bandwidth it thinks might be available, but the DNS is not, and
TCP's congestion control algorithms will happily overwhelm a comparatively
reticent protocol like the DNS.

You probably also have buffer bloat, which makes these problems worse.
(check out https://www.bufferbloat.net/ for LOTS of information)

I am lucky enough that I haven't needed to deal with your problems myself,
so the best I can do is give you a few hints, but no specific advice. The
main idea is to prevent your TCP flows from overwhelming your uplink,
and/or from interfering with DNS traffic. You can (with the right
know-how) do this with some stunt network configuration on your Linux
gateway.

* Use traffic classification and priority queueing to ensure that DNS
  packets can jump ahead of everything else. This probably won't be enough
  by itself because of buffer bloat.

* You can use traffic shaping to ensure that the aggregate traffic from
  your Linux box never tries to over-fill your uplink. Years and years
  ago a friend of mine did this to avoid buffer bloat in their cable
  modem.

* Configure FQ-CoDel on your Linux gateway. This is a queueing algorithm
  specifically designed to avoid buffer bloat and to make TCP back off
  before everything becomes terrible.

That's approximately everything I know about tackling your problem, so I
hope it points you in the right direction...

Tony.
-- 
f.anthony.n.finchhttps://dotat.at/
Biscay: Cyclonic in far north, otherwise westerly or southwesterly, 4
to 6, occasionally 7 in north. Slight or moderate becoming moderate or
rough. Squally thundery showers. Good, occasionally poor.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Peter via bind-users
Yes I went in services and put in start parameters -n 7 and 9.16.19 
started however a bug in windows means it does not save the parameter at 
least I think it a bug so you have to manually put in -n 7 to start bind.



On 23/07/2021 7:53 pm, Ondřej Surý wrote:

Thanks, having such a simple reproducer is helpful.

Can you try if adding `-n 8` vs `-n 7` have the same effect?

Ondřej
--
Ondřej Surý — ISC (He/Him)

My working hours and your working hours may be different. Please do 
not feel obligated to reply outside your normal working hours.


On 23. 7. 2021, at 20:31, Peter via bind-users 
 wrote:


 Well I reported it and we see what happens my main bind is not in a 
virtual machine I guess I cound disbale Hyper-Threading as a 
workaround...

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to 
unsubscribe from this list


ISC funds the development of this software with paid support 
subscriptions. Contact us at https://www.isc.org/contact/ for more 
information.



bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Peter via bind-users

update on how to get bind to run with parameters for windows

make folder in C:\ named

make file called named.bat

in the bat file add:

sc start named -n 7

in services > ISC BIND recovery tab

first failure select run a program

check enable actions for stops with errors

in run program browse for named.bat

apply and now start the services.

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Hoi lam Poon
Sorry 滾

2021-07-23 11:31 GMT-07:00, Peter via bind-users :
> Well I reported it and we see what happens my main bind is not in a
> virtual machine I guess I cound disbale Hyper-Threading as a workaround...
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Hoi lam Poon
你們聊天滾到別個地方,給我滾

2021-07-23 11:31 GMT-07:00, Peter via bind-users :
> Well I reported it and we see what happens my main bind is not in a
> virtual machine I guess I cound disbale Hyper-Threading as a workaround...
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: ITS THE NUMBER OF CORES/THREADS

2021-07-23 Thread Hoi lam Poon
垃圾

2021-07-23 11:31 GMT-07:00, Peter via bind-users :
> Well I reported it and we see what happens my main bind is not in a
> virtual machine I guess I cound disbale Hyper-Threading as a workaround...
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users