Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 10.1-RELEASE-p4 - part 2

2015-01-26 Thread Daniel Ryšlink
Downgraded to BIND 9.9.6, the leak is gone, using the same named.conf, 
same HW, same environment.


It is highly likely there is really a memory leak problem in Bind 9.10.

--
S pozdravem,
Daniel Ryšlink
System Administrator

Dial Telecom a. s.
Křižíkova 36a/237
186 00 Praha 3, Česká Republika
Tel.:+420.226204627
daniel.rysl...@dialtelecom.cz
---
www.dialtelecom.cz
Dial Telecom, a.s.
Jednoduše se připojte
---



 Forwarded Message 
Message-ID: <54c2b2f1.2080...@dialtelecom.cz>
Date:   Fri, 23 Jan 2015 21:45:37 +0100
From:   Daniel Ryšlink 
User-Agent: 	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 
Thunderbird/31.4.0

MIME-Version:   1.0
To: bind-users@lists.isc.org
Subject: 	Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 
10.1-RELEASE-p4

Content-Type:   text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding:  8bit



Hello,

Detailed information about the resolver can be fount in the tgz archive at:

http://www.mujweb.cz/nakamura/dns/leakinfo.tgz

leak.png - munin graph of memory allocation from the last few days
named.conf - BIND config in the canonical form (output of
named-checkconf -p)
dmidecode.txt - information about the server hardware
named.stats - log of "rndc stats" dump created by munin-node each five
minutes

Basically, the symptoms manifest in the form of the named process slowly
allocating more and more memory until it runs out of swap and crashes.
The interesting thing is that the inactive memory is not recycled and
used, and in the moment of the named crash there is still a lot of
Inactive memory.

There are no significant peaks in network traffic or query rates.

The problems appeared after upgrading to FreeBSD 10.1 and upgrading to
Bind 9.10. Before, the same server run without problems for several
years on bind 9.9.x and FreeBSD 8.x versions, everything was quite stable.

The server operates behind an OpenBSD pf firewall that restricts access
to TCP/UDP port 53 to only defined IP ranges of our clients.

Things that I tried:
- installing the latest openssl from ports to avoid the problem in the
advisory from 14.01.2015
- removing all unnecessary compile options (like IDN, rate limiting) and
recompiling BIND from ports
- tweaking the max-cache-size, tcp-clients and recursive-clients options

Any insights into the problem are highly appreciated, since I am at my
wit's end.

Thank you in advance.

--
S pozdravem,
Daniel Ryšlink
System Administrator

Dial Telecom a. s.
Křižíkova 36a/237
186 00 Praha 3, Česká Republika
Tel.:+420.226204627
daniel.rysl...@dialtelecom.cz
---
www.dialtelecom.cz
Dial Telecom, a.s.
Jednoduše se připojte
---



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

BIND response time is relatively high

2015-01-26 Thread alaa m zidan
 hi ,
I noticed that at peak hours, BIND response time is relatively high for some 
servers.non-cached query takes over 700msI set some kernel parameters to tune 
the network and sockets for redhat 6 and set some global options to tune the 
BIND by modifying the cache settings, but neither I get the cache I set the 
limit to filed up, nor I get better performance.
some settings are as below,
kernel sysctl:net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_max_orphans = 40net.core.somaxconn = 4096net.ipv4.tcp_rmem = 
4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432

bind named.conf global options:=    
cleaning-interval 1440;
    max-cache-ttl 2419200;
    max-ncache-ttl 86400;
    max-cache-size 5120m;
server specs:===memory is 8GB
memory usage is not exceeding 20% or 1.7GB while the cache is limited to 5GB as 
shown in the settings above, i wouldn't be more happier if i could have seen 
memory utilization spikes to the sky.
could you please suggest !!!


thanks



___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

RE: BIND response time is relatively high

2015-01-26 Thread Darcy Kevin (FCA)
The parameter that is glaringly missing from your list is “recursive-clients”. 
Do you have that set at default value (1000) or have you bumped it up higher? 
Since you say that this happens at “peak hours”, recursive-clients is the prime 
suspect, since it governs how many *simultaneous* recursive requests can be 
handled. Note that you should see some indication in your logs if you’re 
running into a recursive-clients limit. Also, you can see the current number of 
recursive clients, in real time, via the “rndc status” display. Check whether 
you’re at or close to your limit.

If that doesn’t pan out, I’d also look at this from the networking standpoint. 
The servers that are experiencing high response times, are they on congested 
links? High-latency links? Was there some sort of anomalies occurring at the 
time of the high response times, e.g. high packet loss? Ultimately, you might 
have to mimic resolution of some of the “slow” queries using a command-line 
tool like “dig” (with recursion turned off, possibly the +trace option is 
useful here) and/or take packet captures to identify the source of the 
slowness. It’s quite possible that your “peak hours” are peak hours for your 
Internet providers as well, so the network characteristics of your connections 
may be less than acceptable at those times.


- Kevin

From: bind-users-boun...@lists.isc.org 
[mailto:bind-users-boun...@lists.isc.org] On Behalf Of alaa m zidan
Sent: Monday, January 26, 2015 3:26 PM
To: bind-users@lists.isc.org
Subject: BIND response time is relatively high

hi ,

I noticed that at peak hours, BIND response time is relatively high for some 
servers.
non-cached query takes over 700ms
I set some kernel parameters to tune the network and sockets for redhat 6 and 
set some global options to tune the BIND by modifying the cache settings, but 
neither I get the cache I set the limit to filed up, nor I get better 
performance.

some settings are as below,

kernel sysctl:

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_max_orphans = 40
net.core.somaxconn = 4096
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432


bind named.conf global options:
=
cleaning-interval 1440;
max-cache-ttl 2419200;
max-ncache-ttl 86400;
max-cache-size 5120m;

server specs:
===
memory is 8GB
memory usage is not exceeding 20% or 1.7GB while the cache is limited to 5GB as 
shown in the settings above, i wouldn't be more happier if i could have seen 
memory utilization spikes to the sky.

could you please suggest !!!



thanks




___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: BIND response time is relatively high

2015-01-26 Thread Niall O'Reilly
At Mon, 26 Jan 2015 21:50:37 +,
Darcy Kevin (FCA) wrote:
> 
> 
> The parameter that is glaringly missing from your list is
> “recursive-clients”. Do you have that set at default value (1000) or
> have you bumped it up higher? Since you say that this happens at “peak
> hours”, recursive-clients is the prime suspect,

  Besides what Kevin suggests, it may be worth checking for swapping
  and/or IO wait using 'top'.

  /Niall
  
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


is this "normal" if not what to do about it?

2015-01-26 Thread John

my experimental zone (the family site) klam.ca has a KSK and a ZSK.
There appear to be time differences between the records reported by DIG 
and the source records on file.
In the case of the ZSK the inactive date-time is a few hours different, 
but in the ZSKs case it is 3 months.


Is this a problem, and if so what do I do about it.

The following dates were found in the source record and the output from dig.

source record   dig
KSK -A
20150126163534  20150117012252
KSK -I
20150225173534
20150225144654



ZSK -A
20150126163534
20150126173534
ZSK -I
20150527015028
20150225173534



--
John Allen
KLaM
--
If you are out to describe the truth, leave elegance to the tailor.


smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: is this "normal" if not what to do about it?

2015-01-26 Thread John

oops!! I swapped the ZSK and KSK in the table.


On January 26, 2015 9:09:40 PM John  wrote:


my experimental zone (the family site) klam.ca has a KSK and a ZSK.
There appear to be time differences between the records reported by DIG
and the source records on file.
In the case of the ZSK the inactive date-time is a few hours different,
but in the ZSKs case it is 3 months.

Is this a problem, and if so what do I do about it.

The following dates were found in the source record and the output from dig.

source record   dig
KSK -A
20150126163534  20150117012252
KSK -I
20150225173534
20150225144654



ZSK -A
20150126163534
20150126173534
ZSK -I
20150527015028
20150225173534



--
John Allen
KLaM
--
If you are out to describe the truth, leave elegance to the tailor.



--
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to 
unsubscribe from this list


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: is this "normal" if not what to do about it?

2015-01-26 Thread Mark Andrews

Activation/inactivation is when named starts/stops signing with the key.
Inception/expiratioin is the time perion for which the signature is valid. 

These are different time periods and are basically unrelated.

The inception will be < inactivation.
The expiratin will be > activation.

After a key is activated it will be used to sign records that change.
The sig-validity-interval (less a small random factor) is used to
determine for how long a signature is valid for.


Mark

In message <14b292cc598.2743.14d9188d684ea7b6d1324cf92af46...@klam.ca>, John 
writes:
> oops!! I swapped the ZSK and KSK in the table.
> 
> 
> On January 26, 2015 9:09:40 PM John  wrote:
> 
> > my experimental zone (the family site) klam.ca has a KSK and a ZSK.
> > There appear to be time differences between the records reported by DIG
> > and the source records on file.
> > In the case of the ZSK the inactive date-time is a few hours different,
> > but in the ZSKs case it is 3 months.
> >
> > Is this a problem, and if so what do I do about it.
> >
> > The following dates were found in the source record and the output from dig.
> >
> > source record   dig
> > KSK -A
> > 20150126163534  20150117012252
> > KSK -I
> > 20150225173534
> > 20150225144654
> >
> > 
> > 
> > ZSK -A
> > 20150126163534
> > 20150126173534
> > ZSK -I
> > 20150527015028
> > 20150225173534
> >
> >
> >
> > --
> > John Allen
> > KLaM
> > --
> > If you are out to describe the truth, leave elegance to the tailor.
> >
> >
> >
> > --
> > ___
> > Please visit https://lists.isc.org/mailman/listinfo/bind-users to 
> > unsubscribe from this list
> >
> > bind-users mailing list
> > bind-users@lists.isc.org
> > https://lists.isc.org/mailman/listinfo/bind-users
> 
> 14b292ccdd3902274342ea2c58
> Content-Type: text/html; charset="UTF-8"
> Content-Transfer-Encoding: 8bit
> 
> 
> 
>   
> 
> 
> 
> 
> 
> oops!! I swapped the ZSK and
> KSK in the table.
> 
> 
>  style="color: black; font-size: 10pt; font-family: Arial, sans-serif; margin: 
> 10pt 0;">On
> January 26, 2015 9:09:40 PM John  wrote:
>  style="margin: 0 0 0 0.75ex; border-left: 1px solid #808080; padding-left: 
> 0.75ex;">
>my experimental zone (the family site)
>   klam.ca has a KSK and a ZSK. 
>   There appear to be time differences between the records reported
>   by DIG and the source records on file. 
>   In the case of the ZSK the inactive date-time is a few hours
>   different, but in the ZSKs case it is 3 months.
>   
>   Is this a problem, and if so what do I do about it.
>   
>   The following dates were found in the source record and the output
>   from dig.
>   
> 
>   
> 
> 
> source record 
> dig
> 
>   
>   
> KSK -A 
> 
> 20150126163534 
> 20150117012252
> 
>   
>   
> KSK -I
> 
> 20150225173534
> 
> 20150225144654
> 
>   
>   
> 
> 
> 
> 
> 
> 
>   
>   
> ZSK -A
> 
> 20150126163534
> 
> 20150126173534
>   
>   
> ZSK -I
> 
> 20150527015028
> 
> 20150225173534
> 
>   
> 
>   
>   
>    
>   -- 
>   John Allen
>   KLaM
>   --
>   If you are out to describe the truth, leave elegance to the
>   tailor.
> 
>   
> ___
> Please visit  href="https://lists.isc.org/mailman/listinfo/bind-users";>https://lists.isc.org/mailman/listinfo/bind-users
> to unsubscribe from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org
>  href="https://lists.isc.org/mailman/listinfo/bind-users";>https://lists.isc.org/mailman/listinfo/bind-users
> 
> 
> 
> 
> 
> 14b292ccdd3902274342ea2c58--
> 
> 
> --===0592810492330139115==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
> --===0592810492330139115==--
> 
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
Please visit https://lists.isc.org/mailm

Re: Possible memory leak on BIND 9.10.1-P1 running on FreeBSD 10.1-RELEASE-p4 - part 2

2015-01-26 Thread Mukund Sivaraman
Hi Daniel

On Mon, Jan 26, 2015 at 02:56:44PM +0100, Daniel Ryšlink wrote:
> Downgraded to BIND 9.9.6, the leak is gone, using the same named.conf,
> same HW, same environment.
> 
> It is highly likely there is really a memory leak problem in Bind
> 9.10.

Because many of these reports are on FreeBSD 10 in a resolver
configuration, we are checking to see if we are able to reproduce it
there.

Meanwhile, please can you enable statistics-channels in named.conf and
send us a dump of the XML statistics along with process sizes reported
by ps when named grows very large?

Mukund


pgpiwxtTt3oka.pgp
Description: PGP signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Question about bind-dlz performance limit

2015-01-26 Thread WXR
I'm using bind-dlz(bind version 9.10) with mysql to store zone data.
According to‍ the dlz official documents I use the compile arguments " 
-enable-threads=no‍".

Now I use dnstop and netstat to monitor the performance,and find there is a 
perfomance bottleneck of bind-dlz.

Once the QPS increses to 500-600,the response time of each dns query will be 
very large , up to 600msec-1000msec.And the Recv-Q‍ of udp socket where reach 
124928.‍(the default /proc/sys/net/core/rmem_default‍ value)

# dnstop eth1 -R -Q
‍Queries: 519 new, 9326898 
Replies: 508 new, 6612496 total‍

# watch -n 1 netstat -lnp
 Proto Recv-Q Send-Q Local Address   Foreign Address 
State   PID/Program name   
udp 124813‍  0 121.199.47.200:53   0.0.0.0:*
   31867/named ‍

I think 500-600 QPS is too low,is it the normal performance of bind-dlz?How to 
optimize it?___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Question about bind-dlz performance limit

2015-01-26 Thread Evan Hunt
On Tue, Jan 27, 2015 at 02:50:33PM +0800, WXR wrote:
> I'm using bind-dlz(bind version 9.10) with mysql to store zone data.
> According to the dlz official documents I use the compile
> arguments " -enable-threads=no".

If you're on 9.10, the documentation you're using is somewhat out of date.

Rebuild with threads; do *not* use "--with-dlz-mysql".  Instead, try
the dynamically-linkable MySQL DLZ module in contrib/dlz/modules/mysql;
it has thread support.

There's no documentation to speak of, but there's a "testing" directory
with a sample configuration you can work from.  I would expect to see
better performance, though still not very good.  (DLZ at its best is
still quite slow.)

-- 
Evan Hunt -- e...@isc.org
Internet Systems Consortium, Inc.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users