Hello All, We use dnsmasq as a very effective replacement for the ISC software, thus far with great success. However, we have run into occasional performance problems at scale.
These seem to manifest as a general slowdown of the request -> reply process, which can sometimes exceed 90 seconds from request to reply. The problems only cause issues when there is a penalty for missing a request, e.g. when we boot an entire rack of servers. In the default configuration, many servers will not attempt to retry the boot sequence, forcing us to detect hung machines and issue a remote reboot. We often have to do this several times in the frequent case of building more than a few hundred servers in a single batch. Here are the symptoms: 1. We never have any issue with even mass installation as long as the configuration contains less than a few thousand static entries and a few hundred subnets. 2. At some point, dhcp will slow down: this sequence was taken with 7383 static host entries defined: --------------------------------------------------------------------------- TIME: 19:43:47.480189 IP: > (00:01:e8:92:fd:41) > (00:21:9b:a2:e4:47) OP: 1 (BOOTPREQUEST) HTYPE: 1 (Ethernet) HLEN: 6 HOPS: 1 XID: 2472c541 <SNIP> --------------------------------------------------------------------------- TIME: 19:45:35.963209 IP: > (00:21:9b:a2:e4:47) > (00:00:5e:00:01:8d) OP: 2 (BOOTPREPLY) HTYPE: 1 (Ethernet) HLEN: 6 HOPS: 1 XID: 2472c541 <SNIP> 3. If I reduce the number of static entries to, for example, 2408, response time returns to sub-second. General configuration notes: This request is handled by a relay, and we have the following config options in play: no-ping no-hosts no-resolv cache-size=0 dhcp-lease-max=20000 dhcp-authoritative conf-dir=/etc/dnsmasq.d domain=sekret.zynga.com port=0 Fast: # ls -la /etc/dnsmasq.conf -rw-r--r-- 1 root root 408222 Jun 14 20:20 /etc/dnsmasq.conf # grep dhcp-range /etc/dnsmasq.conf | wc 80 80 6340 # grep dhcp-host /etc/dnsmasq.conf | wc 2408 2410 209798 Slow: # ls -la /etc/dnsmasq.conf -rw-r--r-- 1 root root 1205132 Jun 14 20:31 /etc/dnsmasq.conf # grep dhcp-range /etc/dnsmasq.conf | wc 80 80 6340 # grep dhcp-host /etc/dnsmasq.conf | wc 7383 7387 650118 Are there any obvious inflection points that would cause the server to drop in performance by a few orders of magnitude with lots of hosts defined? Are there any recommendations for tuning, beyond introducing more dnsmasq servers? -Mike