> -----Original Message-----
> From: bind-users [mailto:bind-users-boun...@lists.isc.org] On Behalf Of
> Alex
> Sent: Thursday, 27 September 2018 2:52 AM
> To: bind-users@lists.isc.org
> Subject: BIND and UDP tuning
> 
> Hi,
> 
> I reported a few weeks ago that I was experiencing a really high
> number of "SERVFAIL" messages in my bind-9.11.4-P1 system running on
> fedora28, and I haven't yet found a solution. This is all now running
> on a 165/35 cable system.
> 
> I found a program named dropwatch which is showing a significant
> number of dropped UDP packets, particularly when there are bursts of
> email traffic:
> 
> 12 drops at skb_queue_purge+13 (0xffffffff9f79a0c3)
> 1 drops at __udp4_lib_rcv+1e6 (0xffffffff9f83bdf6)
> 4 drops at __udp4_lib_rcv+1e6 (0xffffffff9f83bdf6)
> 5 drops at nf_hook_slow+a7 (0xffffffff9f7faff7)
> 3 drops at sk_stream_kill_queues+48 (0xffffffff9f7a1158)
> 3 drops at __udp4_lib_rcv+1e6 (0xffffffff9f83bdf6)
> ...
> 
> # netstat -us
> ...
> Udp:
>     23449482 packets received
>     1724269 packets to unknown port received
>     8248 packet receive errors
>     31394909 packets sent
>     8243 receive buffer errors
>     0 send buffer errors
>     InCsumErrors: 5
>     IgnoredMulti: 43247
> 
> The SERVFAIL messages don't necessarily correspond to the UDP packet
> errors shown by netstat, but the dropwatch output is continuous. The
> netstat packet receive errors also don't seem to correspond to
> "SERVFAIL" or "Name service" errors:
> 
> 26-Sep-2018 12:42:49.743 query-errors: info: client @0x7fb3c41634d0
> 127.0.0.1#44104 (46.36.47.104.wl.mailspike.net): query failed
> (SERVFAIL) for 46.36.47.104.wl.mailspike.net/IN/A at
> ../../../bin/named/query.c:8580
> 
> Sep 26 12:47:11 mail03 postfix/dnsblog[22821]: warning: dnsblog_query:
> lookup error for DNS query 196.91.107.80.bl.spameatingmonkey.net: Host
> or domain name not found. Name service error for
> name=196.91.107.80.bl.spameatingmonkey.net type=A: Host not found, try
> again
> 
> I've been following this thread from some time ago, but nothing I've
> done has made a difference. I really don't know what the buffer sizes
> should be.
> https://urldefense.proofpoint.com/v2/url?u=http-3A__bind-2Dusers-
> 2Dforum.2342410.n4.nabble.com_Tuning-2Dsuggestions-2Dfor-2Dhigh-2Dcore-
> 2Dcount-2DLinux-2Dservers-
> 2Dtd3899.html&d=DwICAg&c=MOptNlVtIETeDALC_lULrw&r=udvvbouEjrWNUMab5xo_vLb
> UE6LRGu5fmxLhrDvVJS8&m=5XQNuuRQ4kxK03zqoWaJHIdaJvNdsyTKHuFlDKedbpc&s=5Dqh
> ne-5w5V_1coBTBvTITwK2EFeankOegTaofy8S5w&e=
> 
> Are there specific bind tunables you might recommend? edns-udp-size,
> perhaps?
> 
> Any ideas on other tunables such as net.core.*mem_default etc?

*chuckles to self*

I was just referring back to that thread myself to try remember what I did.

I ended up tuning the following items:

  - name: SYSCTL system tuning, basics
    sysctl:
      name: "{{ item.name }}"
      value: "{{ item.value }}"
      sysctl_set: yes
      state: present
    with_items:
      - { name: 'vm.swappiness', value: 0 }
      - { name: 'net.core.netdev_max_backlog', value: 32768 }
      - { name: 'net.core.netdev_budget', value: 2700 }
      - { name: 'net.ipv4.tcp_sack', value: 0 }
      - { name: 'net.core.somaxconn', value: 2048 }
      - { name: 'net.core.rmem_default', value: 16777216 }
      - { name: 'net.core.rmem_max', value: 16777216 }
      - { name: 'net.core.wmem_default', value: 16777216 }
      - { name: 'net.core.wmem_max', value: 16777216 }

(Yeah, I was using ansible for that testing!)

The checking of the /proc/net/softnet_stat is what was driving some of those 
settings, so you may want to dig into that. I never did solve the netstat 
showing issues though, so keep that in mind.

If you are running high query throughput and have many CPU cores, the pinning 
of cores was a significant performance improvement.

You've not said here what sort of query throughput you are having here however. 
Be aware that if this is running in a virtualized environment, you may want to 
be looking at the host machine instead of the guest as the network performance 
there can have a significant impact.

Whilst mentioned in passing on that thread, there was also poking around with 
TOE, pause, coalesce adaptive and ring size settings (look at ethtool -K, 
ethtool -A, ethtool -C and ethtool -G), but sadly have lost the specific 
commands. 

Stuart Browne
Neustar, Inc. / Sr Systems Admin
Level 8, 10 Queens Road, Melbourne, Australia VIC 3004
Office: +61.3.9866.3710
stuart.browne@team.neustar / home.neustar

Follow Neustar: LinkedIn / Twitter

Reduce your environmental footprint. Print only if necessary.

The information contained in this email message is intended only for the use of 
the recipient(s) named above and may contain confidential and/or privileged 
information. If you are not the intended recipient you have received this email 
message in error and any review, dissemination, distribution, or copying of 
this message is strictly prohibited. If you have received this communication in 
error, please notify us immediately and delete the original message.



_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to