Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 10.1-RELEASE-p4 - part 2
Downgraded to BIND 9.9.6, the leak is gone, using the same named.conf, same HW, same environment. It is highly likely there is really a memory leak problem in Bind 9.10. -- S pozdravem, Daniel Ryšlink System Administrator Dial Telecom a. s. Křižíkova 36a/237 186 00 Praha 3, Česká Republika Tel.:+420.226204627 daniel.rysl...@dialtelecom.cz --- www.dialtelecom.cz Dial Telecom, a.s. Jednoduše se připojte --- Forwarded Message Message-ID: <54c2b2f1.2080...@dialtelecom.cz> Date: Fri, 23 Jan 2015 21:45:37 +0100 From: Daniel Ryšlink User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: bind-users@lists.isc.org Subject: Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 10.1-RELEASE-p4 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hello, Detailed information about the resolver can be fount in the tgz archive at: http://www.mujweb.cz/nakamura/dns/leakinfo.tgz leak.png - munin graph of memory allocation from the last few days named.conf - BIND config in the canonical form (output of named-checkconf -p) dmidecode.txt - information about the server hardware named.stats - log of "rndc stats" dump created by munin-node each five minutes Basically, the symptoms manifest in the form of the named process slowly allocating more and more memory until it runs out of swap and crashes. The interesting thing is that the inactive memory is not recycled and used, and in the moment of the named crash there is still a lot of Inactive memory. There are no significant peaks in network traffic or query rates. The problems appeared after upgrading to FreeBSD 10.1 and upgrading to Bind 9.10. Before, the same server run without problems for several years on bind 9.9.x and FreeBSD 8.x versions, everything was quite stable. The server operates behind an OpenBSD pf firewall that restricts access to TCP/UDP port 53 to only defined IP ranges of our clients. Things that I tried: - installing the latest openssl from ports to avoid the problem in the advisory from 14.01.2015 - removing all unnecessary compile options (like IDN, rate limiting) and recompiling BIND from ports - tweaking the max-cache-size, tcp-clients and recursive-clients options Any insights into the problem are highly appreciated, since I am at my wit's end. Thank you in advance. -- S pozdravem, Daniel Ryšlink System Administrator Dial Telecom a. s. Křižíkova 36a/237 186 00 Praha 3, Česká Republika Tel.:+420.226204627 daniel.rysl...@dialtelecom.cz --- www.dialtelecom.cz Dial Telecom, a.s. Jednoduše se připojte --- ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
BIND response time is relatively high
hi , I noticed that at peak hours, BIND response time is relatively high for some servers.non-cached query takes over 700msI set some kernel parameters to tune the network and sockets for redhat 6 and set some global options to tune the BIND by modifying the cache settings, but neither I get the cache I set the limit to filed up, nor I get better performance. some settings are as below, kernel sysctl:net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.core.rmem_max = 33554432 net.core.wmem_max = 33554432net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_max_orphans = 40net.core.somaxconn = 4096net.ipv4.tcp_rmem = 4096 87380 33554432 net.ipv4.tcp_wmem = 4096 65536 33554432 bind named.conf global options:= cleaning-interval 1440; max-cache-ttl 2419200; max-ncache-ttl 86400; max-cache-size 5120m; server specs:===memory is 8GB memory usage is not exceeding 20% or 1.7GB while the cache is limited to 5GB as shown in the settings above, i wouldn't be more happier if i could have seen memory utilization spikes to the sky. could you please suggest !!! thanks ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
RE: BIND response time is relatively high
The parameter that is glaringly missing from your list is “recursive-clients”. Do you have that set at default value (1000) or have you bumped it up higher? Since you say that this happens at “peak hours”, recursive-clients is the prime suspect, since it governs how many *simultaneous* recursive requests can be handled. Note that you should see some indication in your logs if you’re running into a recursive-clients limit. Also, you can see the current number of recursive clients, in real time, via the “rndc status” display. Check whether you’re at or close to your limit. If that doesn’t pan out, I’d also look at this from the networking standpoint. The servers that are experiencing high response times, are they on congested links? High-latency links? Was there some sort of anomalies occurring at the time of the high response times, e.g. high packet loss? Ultimately, you might have to mimic resolution of some of the “slow” queries using a command-line tool like “dig” (with recursion turned off, possibly the +trace option is useful here) and/or take packet captures to identify the source of the slowness. It’s quite possible that your “peak hours” are peak hours for your Internet providers as well, so the network characteristics of your connections may be less than acceptable at those times. - Kevin From: bind-users-boun...@lists.isc.org [mailto:bind-users-boun...@lists.isc.org] On Behalf Of alaa m zidan Sent: Monday, January 26, 2015 3:26 PM To: bind-users@lists.isc.org Subject: BIND response time is relatively high hi , I noticed that at peak hours, BIND response time is relatively high for some servers. non-cached query takes over 700ms I set some kernel parameters to tune the network and sockets for redhat 6 and set some global options to tune the BIND by modifying the cache settings, but neither I get the cache I set the limit to filed up, nor I get better performance. some settings are as below, kernel sysctl: net.ipv4.ip_local_port_range = 1024 65535 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.core.rmem_max = 33554432 net.core.wmem_max = 33554432 net.ipv4.tcp_max_syn_backlog = 16384 net.ipv4.tcp_max_orphans = 40 net.core.somaxconn = 4096 net.ipv4.tcp_rmem = 4096 87380 33554432 net.ipv4.tcp_wmem = 4096 65536 33554432 bind named.conf global options: = cleaning-interval 1440; max-cache-ttl 2419200; max-ncache-ttl 86400; max-cache-size 5120m; server specs: === memory is 8GB memory usage is not exceeding 20% or 1.7GB while the cache is limited to 5GB as shown in the settings above, i wouldn't be more happier if i could have seen memory utilization spikes to the sky. could you please suggest !!! thanks ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: BIND response time is relatively high
At Mon, 26 Jan 2015 21:50:37 +, Darcy Kevin (FCA) wrote: > > > The parameter that is glaringly missing from your list is > “recursive-clients”. Do you have that set at default value (1000) or > have you bumped it up higher? Since you say that this happens at “peak > hours”, recursive-clients is the prime suspect, Besides what Kevin suggests, it may be worth checking for swapping and/or IO wait using 'top'. /Niall ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
is this "normal" if not what to do about it?
my experimental zone (the family site) klam.ca has a KSK and a ZSK. There appear to be time differences between the records reported by DIG and the source records on file. In the case of the ZSK the inactive date-time is a few hours different, but in the ZSKs case it is 3 months. Is this a problem, and if so what do I do about it. The following dates were found in the source record and the output from dig. source record dig KSK -A 20150126163534 20150117012252 KSK -I 20150225173534 20150225144654 ZSK -A 20150126163534 20150126173534 ZSK -I 20150527015028 20150225173534 -- John Allen KLaM -- If you are out to describe the truth, leave elegance to the tailor. smime.p7s Description: S/MIME Cryptographic Signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: is this "normal" if not what to do about it?
oops!! I swapped the ZSK and KSK in the table. On January 26, 2015 9:09:40 PM John wrote: my experimental zone (the family site) klam.ca has a KSK and a ZSK. There appear to be time differences between the records reported by DIG and the source records on file. In the case of the ZSK the inactive date-time is a few hours different, but in the ZSKs case it is 3 months. Is this a problem, and if so what do I do about it. The following dates were found in the source record and the output from dig. source record dig KSK -A 20150126163534 20150117012252 KSK -I 20150225173534 20150225144654 ZSK -A 20150126163534 20150126173534 ZSK -I 20150527015028 20150225173534 -- John Allen KLaM -- If you are out to describe the truth, leave elegance to the tailor. -- ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: is this "normal" if not what to do about it?
Activation/inactivation is when named starts/stops signing with the key. Inception/expiratioin is the time perion for which the signature is valid. These are different time periods and are basically unrelated. The inception will be < inactivation. The expiratin will be > activation. After a key is activated it will be used to sign records that change. The sig-validity-interval (less a small random factor) is used to determine for how long a signature is valid for. Mark In message <14b292cc598.2743.14d9188d684ea7b6d1324cf92af46...@klam.ca>, John writes: > oops!! I swapped the ZSK and KSK in the table. > > > On January 26, 2015 9:09:40 PM John wrote: > > > my experimental zone (the family site) klam.ca has a KSK and a ZSK. > > There appear to be time differences between the records reported by DIG > > and the source records on file. > > In the case of the ZSK the inactive date-time is a few hours different, > > but in the ZSKs case it is 3 months. > > > > Is this a problem, and if so what do I do about it. > > > > The following dates were found in the source record and the output from dig. > > > > source record dig > > KSK -A > > 20150126163534 20150117012252 > > KSK -I > > 20150225173534 > > 20150225144654 > > > > > > > > ZSK -A > > 20150126163534 > > 20150126173534 > > ZSK -I > > 20150527015028 > > 20150225173534 > > > > > > > > -- > > John Allen > > KLaM > > -- > > If you are out to describe the truth, leave elegance to the tailor. > > > > > > > > -- > > ___ > > Please visit https://lists.isc.org/mailman/listinfo/bind-users to > > unsubscribe from this list > > > > bind-users mailing list > > bind-users@lists.isc.org > > https://lists.isc.org/mailman/listinfo/bind-users > > 14b292ccdd3902274342ea2c58 > Content-Type: text/html; charset="UTF-8" > Content-Transfer-Encoding: 8bit > > > > > > > > > > oops!! I swapped the ZSK and > KSK in the table. > > > style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: > 10pt 0;">On > January 26, 2015 9:09:40 PM Johnwrote: > style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: > 0.75ex;"> >my experimental zone (the family site) > klam.ca has a KSK and a ZSK. > There appear to be time differences between the records reported > by DIG and the source records on file. > In the case of the ZSK the inactive date-time is a few hours > different, but in the ZSKs case it is 3 months. > > Is this a problem, and if so what do I do about it. > > The following dates were found in the source record and the output > from dig. > > > > > > source record > dig > > > > KSK -A > > 20150126163534 > 20150117012252 > > > > KSK -I > > 20150225173534 > > 20150225144654 > > > > > > > > > > > > ZSK -A > > 20150126163534 > > 20150126173534 > > > ZSK -I > > 20150527015028 > > 20150225173534 > > > > > > Â > -- > John Allen > KLaM > -- > If you are out to describe the truth, leave elegance to the > tailor. > > > ___ > Please visit href="https://lists.isc.org/mailman/listinfo/bind-users";>https://lists.isc.org/mailman/listinfo/bind-users > to unsubscribe from this list > > bind-users mailing list > bind-users@lists.isc.org > href="https://lists.isc.org/mailman/listinfo/bind-users";>https://lists.isc.org/mailman/listinfo/bind-users > > > > > > 14b292ccdd3902274342ea2c58-- > > > --===0592810492330139115== > Content-Type: text/plain; charset="us-ascii" > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > ___ > Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe > from this list > > bind-users mailing list > bind-users@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-users > --===0592810492330139115==-- > -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org ___ Please visit https://lists.isc.org/mailm
Re: Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 10.1-RELEASE-p4 - part 2
Hi Daniel On Mon, Jan 26, 2015 at 02:56:44PM +0100, Daniel Ryšlink wrote: > Downgraded to BIND 9.9.6, the leak is gone, using the same named.conf, > same HW, same environment. > > It is highly likely there is really a memory leak problem in Bind > 9.10. Because many of these reports are on FreeBSD 10 in a resolver configuration, we are checking to see if we are able to reproduce it there. Meanwhile, please can you enable statistics-channels in named.conf and send us a dump of the XML statistics along with process sizes reported by ps when named grows very large? Mukund pgpiwxtTt3oka.pgp Description: PGP signature ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Question about bind-dlz performance limit
I'm using bind-dlz(bind version 9.10) with mysql to store zone data. According to the dlz official documents I use the compile arguments " -enable-threads=no". Now I use dnstop and netstat to monitor the performance,and find there is a perfomance bottleneck of bind-dlz. Once the QPS increses to 500-600,the response time of each dns query will be very large , up to 600msec-1000msec.And the Recv-Q of udp socket where reach 124928.(the default /proc/sys/net/core/rmem_default value) # dnstop eth1 -R -Q Queries: 519 new, 9326898 Replies: 508 new, 6612496 total # watch -n 1 netstat -lnp Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 124813 0 121.199.47.200:53 0.0.0.0:* 31867/named I think 500-600 QPS is too low,is it the normal performance of bind-dlz?How to optimize it?___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Question about bind-dlz performance limit
On Tue, Jan 27, 2015 at 02:50:33PM +0800, WXR wrote: > I'm using bind-dlz(bind version 9.10) with mysql to store zone data. > According to the dlz official documents I use the compile > arguments " -enable-threads=no". If you're on 9.10, the documentation you're using is somewhat out of date. Rebuild with threads; do *not* use "--with-dlz-mysql". Instead, try the dynamically-linkable MySQL DLZ module in contrib/dlz/modules/mysql; it has thread support. There's no documentation to speak of, but there's a "testing" directory with a sample configuration you can work from. I would expect to see better performance, though still not very good. (DLZ at its best is still quite slow.) -- Evan Hunt -- e...@isc.org Internet Systems Consortium, Inc. ___ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users