Re: Tuning suggestions for high-core-count Linux servers

2017-06-01 Thread Plhu

  Hello Stuart,
a few simple ideas to your tests:
 - have you inspected the per-thread CPU? Aren't some of the threads overloaded?
 - have you tried to get the statistics from the Bind server using the
 XML or JSON interface? It may bring you another insight to the errors.
 - I may have missed the connection count you use for testing - can you
 post it? More, how may entries do you have in your database? Can you
 share your named.conf (without any compromising entries)?
 - what is your network environment? How many switches/routers are there
 between your simulator and the Bind server host?
 - is Bind the only running process on the tested server?
 - what CPUs is the Bind server being run on?
 - is there numad running and while trying the taskset, have you
 selected the CPUs on the same processor? What does numastat show during
 the test?
 - how many UDP sockets are in use during your test?

Curious for the responses.

  Lukas

Browne, Stuart  writes:

> Cheers Matthew.
>
> 1)  Not seeing that error, seeing this one instead:
>
> 01-Jun-2017 01:46:27.952 client: warning: client 192.168.0.23#38125 
> (x41fe848-f3d1-4eec-967e-039d075ee864.perf1000): error sending response: 
> would block
>
> Only seeing a few of them per run (out of ~70 million requests).
>
> Whilst I can see where this is raised in the BIND code (lib/isc/unix/socket.c 
> in doio_send), I don't understand the underlying reason for it being set 
> (errno == EWOULDBLOCK || errno == EAGAIN).
>
> I've not bumped wmem/rmem up as much as the link (only to 16MB, not 40MB), 
> but no real difference after tweaks. I did another run with stupidly-large 
> core.{rmem,wmem}_{max,default} (64MB), this actually degraded performance a 
> bit so over tuning isn't good either. Need to figure out a good balance here.
>
> I'd love to figure out what the math here should be.  'X number of 
> simultaneous connections multiplied by Y socket memory size = rmem' or some 
> such.
>
> 2) I am still seeing some udp receive errors and receive buffer errors; about 
> 1.3% of received packets.
>
> From a 'netstat' point of view, I see:
>
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address State
> udp   382976  17664 192.168.1.21:53 0.0.0.0:*
>
> The numbers in the receive queue stay in the 200-300k range whilst the 
> send-queue floats around the 20-40k range. wmem already bumped.
>
> 3) Huh, didn't know about this one. Bumped up the backlog, small increase in 
> throughput for my tests. Still need to figure out how to read sofnet_stat. 
> More google-fu in my future.
>
> After a reboot and the wmem/rmem/backlog increases, no longer any non-zero in 
> the 2nd column.
>
> 4) Yes, max_dgram_qlen is already set to 512.
>
> 5) Oo! new tool! :)
>
> --
> ...
> 11 drops at location 0x815df171
> 854 drops at location 0x815e1c64
> 12 drops at location 0x815df171
> 822 drops at location 0x815e1c64
> ...
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: DNS performance Help when query log is off -- which default parameters will impact the DNS performance

2018-02-22 Thread Plhu

  You may further be interested in Bind statistics for analysis of the
Bind server behaviour under load. Look at an example here what metrics
are avialable in general:

https://github.com/lukas999/bind2pmda

If you wanted to use even the PMDA, forget the github source as the PMDA
is now part of the Performance CoPilot (pcp.io).

Overall, it would be good, if you did some basic checks:
 - per thread utilization of the CPU
 - how do the errors start to demostrate when you get over the maximum
 throughput you achieved (timeouts?, errors?)
 - how about the socket count? Is is close to the exhaustion?

Then, the conditions of your test may be of interest:
 - what kind of queries do you use for you test (A, TXT, AAA, UPDATE
 ... ?)
 - what load generator do you use and how does it behave (e.g. in the
 range from dig in loop to asynchronous handling)?

  Best regards

  Lukas

Paul Kosinski  writes:

> Google search for "man named" turns up:
>
>   https://ftp.isc.org/isc/bind9/cur/9.9/doc/arm/man.named.html
>
> which says (among more details):
>
>   named [-4] [-6] [-c config-file] [-d debug-level] [-E engine-name] [-f]
> [-g] [-M option] [-m flag] [-n #cpus] [-p port] [-s] [-S #max-socks]
> [-t directory] [-U #listeners] [-u user] [-v] [-V] [-x cache-file]
>
> For more explanation, look at:
>
>   
> https://kb.isc.org/article/AA-01249/0/UDP-Listeners-choosing-the-right-value-for-U-when-starting-named.html
>
>
>
> On Thu, 22 Feb 2018 01:50:22 +
> "PENG, JUNAN"  wrote:
>
>> Hi, Paul
>>
>> UDP listeners per interface
>>
>> Do you know how to modify this parameter -- UDP listeners per
>> interface
>>
>> version: BIND 9.10.5-S1  (Unknown)
>> boot time: Tue, 13 Feb 2018 06:12:53 GMT
>> last configured: Tue, 13 Feb 2018 06:12:53 GMT
>> CPUs found: 4
>> worker threads: 4
>> UDP listeners per interface: 3
>> number of zones: 102
>> debug level: 0
>> xfers running: 0
>> xfers deferred: 0
>> soa queries in progress: 0
>> query logging is OFF
>> recursive clients: 0/900/1000
>> tcp clients: 0/15000
>> server is up and running
>>
>> BR
>> Michael
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users