named UDP retransmit timeouts ?
Good day bind experts - Please can anyone advise the best way to optimize named's UDP timeout settings for caching-only local resolver usage over a slow network link - I can't seem to find any in the Bv9ARM document specifically describing how named implements UDP re-transmits - please could someone point me at the right pages or place to look, besides the source code, which I am reading now, if there are any ? My problem is that at home my whole internet goes through one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) , it seems no more than about 128 kilobyte/sec download & less upload bandwidth is available, whenever my browser decides to download something large (like a JavaScript blob) , then DNS requests start timing out, the browser keeps re-issuing its requests, and similar nasty feedback situations occur when the GSM modem's DHCP lease expires and it has to re-setup its NAT for the ethernet link, so all UDP requests time out for about 10 seconds, building up quite a backlog. I have tried playing around with named.conf settings: resolver-retry-interval 8; resolver-retry-time32; max-retry-time 32; but they don't seem to help - I still get a 'DNS freeze' situation for about 10-30 seconds when the GSM modem renegotiates its DHCP lease, during a yum / dnf 'update', during large browser downloads or stream playing ... My Linux v5.12.17 (Fedora-34) x86_64 box runs named 9.6.18 from the Fedora RPM, and hosts a Windows 10 VM, which is quite a chatty DNS user, and runs a hostapd instance through which traffic from a local network of 3 Android mobile phones use as their default data connection, which also use the laptop's DNS server, and send SIP voice traffic through my company's SIP server which I maintain , so the Linux box does NAT for the Windows VM and for the Android mobile clients, the laptop named instance serves authorative zones for my localhost, local VMs and DMZ Android Mobile phone units, and ALL hosts, including the windows host, use BIND named running on the Linux laptop gateway, which is the default route endpoint for all hosts, and which has a 'forwarders { ... };' clause in named.conf containing my Cellular Network provider's DNS server IP addresses . These remote Cellular DNS servers can respond very slowly at peak internet usage times. It is nice to be able to see all packets from the android mobile phones with tcpdump, and to be able to receive the voice traffic that they send to our cloud SIP server (which I can see being NAT-ed), and the SIP server sends back, which get NAT-ed to the Windows VM Dispatcher and audio playback GUI running on the laptop which I also maintain . My BIND named server also implements an RBL blacklist kindly made available as a hosts file, which I convert to a Response Policy Zone file, at https://someonewhocares.org/hosts . DNSSEC is also enabled by default. My named.conf has a clause: allow-query { localhost; 192.168.W.0/24; 192.168.M.0/24; 192.168.V.0/24; }; where W is Windows VM network, M is mobile device network, and V is my corporate L2TP/IPSEC VPN network, also doing NAT, and one 'localhost-resolver' "View" with match-clients { /* same as above */ } ; and recursion yes; This setup works great on a normal office LAN , where there are multiple hops to the internet available, but not on my home slow single ethernet connection to the whole ethernet, through a modem that must peridically renegotiate a DHCP lease. When the modem renegotiates its DHCP lease every hour, I typically have to restart named and hostapd . I just want named to notice that the response times to the forwarders are increasing , and to increase its number of UDP re-transmit attempts and timeout time (time between attempts ) accordingly, and vice versa (decrease them back to defaults when forwarder responsiveness improves). Before I start hacking the named udp.c server code , please could anyone advise if there are ways through configuration settings to adjust the named UDP re-transmit timeout & number of attempts strategy for slow networks ? I can't believe there aren't any ? Thanks in advance for any informative replies, Best Regards, Jason Vas Dias ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
ITS THE NUMBER OF CORES/THREADS
So after ALL that it was down to the number of cores/threads, anything more then 7 cores/threads and 9.16.19 WILL NOT RUN tested in avirtual PC. Man what A BUG ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: ITS THE NUMBER OF CORES/THREADS
Hi Peter, I’ve run a few tests based on your observations regarding the number of vCPU cores and my own findings are that it is specifically 8 vCPUs and 12 vCPUs which exhibit this behaviour. I haven’t been able to test beyond 12 vCPUs because that’s my hardware limit. With 1-7 vCPUs, or with 9-11 vCPUs BIND 9.16.19 starts just fine. But with either 8 vCPUs or 12 vCPUs the ISC BIND service fails to start. If I try and start the service manually I get: / / / / / Windows could not start the ISC BIND service on Local Computer. Error 1067: The process terminated unexpectedly. / / / / / And if I look in the Windows Application Event Log I see named logging all of the following, and then it stops right after logging that it’s using 1 UDP listener per interface: / / / / / BIND 9 is maintained by Internet Systems Consortium, Inc. (ISC), a non-profit 501(c)(3) public-benefit corporation. Support and training for BIND 9 are available at https://www.isc.org/support found 8 CPUs, using 8 worker threads using 1 UDP listener per interface / / / / / You should file a bug report on ISC’s BIND bug tracker if you’d like them to investigate it. It doesn’t impact me I’m afraid because my Windows BIND servers are all 2 vCPU virtual machines – I find that’s plenty enough resource to run BIND for me. https://gitlab.isc.org/isc-projects/bind9/-/issues Best, Richard. From: bind-users On Behalf Of Peter via bind-users Sent: 23 July 2021 5:44 pm To: bind-users@lists.isc.org Subject: ITS THE NUMBER OF CORES/THREADS So after ALL that it was down to the number of cores/threads, anything more then 7 cores/threads and 9.16.19 WILL NOT RUN tested in a virtual PC. Man what A BUG ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
Well I reported it and we see what happens my main bind is not in a virtual machine I guess I cound disbale Hyper-Threading as a workaround... ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
Thanks, having such a simple reproducer is helpful. Can you try if adding `-n 8` vs `-n 7` have the same effect? Ondřej -- Ondřej Surý — ISC (He/Him) My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours. > On 23. 7. 2021, at 20:31, Peter via bind-users > wrote: > > Well I reported it and we see what happens my main bind is not in a virtual > machine I guess I cound disbale Hyper-Threading as a workaround... > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > ISC funds the development of this software with paid support subscriptions. > Contact us at https://www.isc.org/contact/ for more information. > > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: named UDP retransmit timeouts ?
Jason Vas Dias wrote: > > Please can anyone advise the best way to optimize named's > UDP timeout settings for caching-only local resolver usage > over a slow network link - I can't seem to find any in the > Bv9ARM document specifically describing how named > implements UDP re-transmits - please could someone > point me at the right pages or place to look, besides > the source code, which I am reading now, if there are any ? I remember being surprised a while back that the retry intervals and timeouts were more hard-coded than I expected. (But, be warned! I have not refreshed my memory.) The rough idea is that there's a certain amount of co-design between the libc stub resolver (which back in the day came from BIND) and the recursive server. IIRC, the libc resolver has a query timeout of 10s and retries three times (so the overall timeout is about half a minute), and named's resolver has a timeout of about 3s and also retries 3 times, which neatly fits inside libc's 10s timeout. At least that's what my memory tells me, but it may be wrong. But, I think you will not be successful fixing your problems by tweaking DNS software. One of the problems with DNS as a protocol is that its transport layer is very simple and very stupid, so if the underlying network has problems, the DNS isn't able to fight its way through. > My problem is that at home my whole internet goes through > one 100M CAT-6 ethernet cable to a GSM 3G/4G modem (90% 3G WCDMA) , > it seems no more than about 128 kilobyte/sec download & less upload > bandwidth is available, whenever my browser decides to download > something large (like a JavaScript blob) , then DNS requests > start timing out, the browser keeps re-issuing its requests, > and similar nasty feedback situations occur when the GSM > modem's DHCP lease expires and it has to re-setup its NAT for > the ethernet link, so all UDP requests time out for about > 10 seconds, building up quite a backlog. Ugh, that sounds horrible. I think the basic problem is that TCP is very aggressive about filling up whatever bandwidth it thinks might be available, but the DNS is not, and TCP's congestion control algorithms will happily overwhelm a comparatively reticent protocol like the DNS. You probably also have buffer bloat, which makes these problems worse. (check out https://www.bufferbloat.net/ for LOTS of information) I am lucky enough that I haven't needed to deal with your problems myself, so the best I can do is give you a few hints, but no specific advice. The main idea is to prevent your TCP flows from overwhelming your uplink, and/or from interfering with DNS traffic. You can (with the right know-how) do this with some stunt network configuration on your Linux gateway. * Use traffic classification and priority queueing to ensure that DNS packets can jump ahead of everything else. This probably won't be enough by itself because of buffer bloat. * You can use traffic shaping to ensure that the aggregate traffic from your Linux box never tries to over-fill your uplink. Years and years ago a friend of mine did this to avoid buffer bloat in their cable modem. * Configure FQ-CoDel on your Linux gateway. This is a queueing algorithm specifically designed to avoid buffer bloat and to make TCP back off before everything becomes terrible. That's approximately everything I know about tackling your problem, so I hope it points you in the right direction... Tony. -- f.anthony.n.finchhttps://dotat.at/ Biscay: Cyclonic in far north, otherwise westerly or southwesterly, 4 to 6, occasionally 7 in north. Slight or moderate becoming moderate or rough. Squally thundery showers. Good, occasionally poor. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
Yes I went in services and put in start parameters -n 7 and 9.16.19 started however a bug in windows means it does not save the parameter at least I think it a bug so you have to manually put in -n 7 to start bind. On 23/07/2021 7:53 pm, Ondřej Surý wrote: Thanks, having such a simple reproducer is helpful. Can you try if adding `-n 8` vs `-n 7` have the same effect? Ondřej -- Ondřej Surý — ISC (He/Him) My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours. On 23. 7. 2021, at 20:31, Peter via bind-users wrote: Well I reported it and we see what happens my main bind is not in a virtual machine I guess I cound disbale Hyper-Threading as a workaround... ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
update on how to get bind to run with parameters for windows make folder in C:\ named make file called named.bat in the bat file add: sc start named -n 7 in services > ISC BIND recovery tab first failure select run a program check enable actions for stops with errors in run program browse for named.bat apply and now start the services. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
Sorry 滾 2021-07-23 11:31 GMT-07:00, Peter via bind-users : > Well I reported it and we see what happens my main bind is not in a > virtual machine I guess I cound disbale Hyper-Threading as a workaround... > ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
你們聊天滾到別個地方,給我滾 2021-07-23 11:31 GMT-07:00, Peter via bind-users : > Well I reported it and we see what happens my main bind is not in a > virtual machine I guess I cound disbale Hyper-Threading as a workaround... > ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: ITS THE NUMBER OF CORES/THREADS
垃圾 2021-07-23 11:31 GMT-07:00, Peter via bind-users : > Well I reported it and we see what happens my main bind is not in a > virtual machine I guess I cound disbale Hyper-Threading as a workaround... > ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users