Hello All,

We use dnsmasq as a very effective replacement for the ISC software, thus far 
with great success. However, we have run into occasional performance problems 
at scale.

These seem to manifest as a general slowdown of the request -> reply process, 
which can sometimes exceed 90 seconds from request to reply.

The problems only cause issues when there is a penalty for missing a request, 
e.g. when we boot an entire rack of servers. In the default configuration, many 
servers will not attempt to retry the boot sequence, forcing us to detect hung 
machines and issue a remote reboot. We often have to do this several times in 
the frequent case of building more than a few hundred servers in a single batch.

Here are the symptoms:

1. We never have any issue with even mass installation as long as the 
configuration contains less than a few thousand static entries and a few 
hundred subnets.

2. At some point, dhcp will slow down: this sequence was taken with 7383 static 
host entries defined:

---------------------------------------------------------------------------
  TIME: 19:43:47.480189
    IP: > (00:01:e8:92:fd:41) >  (00:21:9b:a2:e4:47)
    OP: 1 (BOOTPREQUEST)
 HTYPE: 1 (Ethernet)
  HLEN: 6
  HOPS: 1
   XID: 2472c541

<SNIP>

---------------------------------------------------------------------------
  TIME: 19:45:35.963209
    IP: > (00:21:9b:a2:e4:47) >  (00:00:5e:00:01:8d)
    OP: 2 (BOOTPREPLY)
 HTYPE: 1 (Ethernet)
  HLEN: 6
  HOPS: 1
   XID: 2472c541

<SNIP>

3. If I reduce the number of static entries to, for example, 2408, response 
time returns to sub-second.

General configuration notes:

This request is handled by a relay, and we have the following config options in 
play:

no-ping
no-hosts
no-resolv
cache-size=0
dhcp-lease-max=20000
dhcp-authoritative
conf-dir=/etc/dnsmasq.d
domain=sekret.zynga.com
port=0

Fast:

# ls -la /etc/dnsmasq.conf
-rw-r--r-- 1 root root 408222 Jun 14 20:20 /etc/dnsmasq.conf
# grep dhcp-range /etc/dnsmasq.conf | wc
     80      80    6340
# grep dhcp-host /etc/dnsmasq.conf | wc
   2408    2410  209798

Slow:

# ls -la /etc/dnsmasq.conf
-rw-r--r-- 1 root root 1205132 Jun 14 20:31 /etc/dnsmasq.conf
# grep dhcp-range /etc/dnsmasq.conf | wc
     80      80    6340
# grep dhcp-host /etc/dnsmasq.conf | wc
   7383    7387  650118

Are there any obvious inflection points that would cause the server to drop in 
performance by a few orders of magnitude with lots of hosts defined? Are there 
any recommendations for tuning, beyond introducing more dnsmasq servers?

-Mike

Reply via email to