Alex, Have you uploaded this pcap with the SERVFAIL's? I didn't have time to look at your first upload but can review this one.
John -----Original Message----- From: bind-users [mailto:bind-users-boun...@lists.isc.org] On Behalf Of Alex Sent: Thursday, September 06, 2018 1:49 PM To: c...@byington.org; bind-users@lists.isc.org Subject: Re: Frequent timeout Hi, On Mon, Sep 3, 2018 at 12:45 PM Carl Byington <c...@byington.org> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On Sun, 2018-09-02 at 21:54 -0400, Alex wrote: > > Do you have any other ideas on how I can isolate this problem? > > Run tcpdump on the external ethernet connection. > > tcpdump -s0 -vv -i %s -nn -w /tmp/outputfile udp dst port domain I've captured some packets that I believe include the packets relating to the SERVFAIL errors I've been receiving. Now I have to figure out how to go through them. In the meantime, I've configured /etc/resolv.conf to send queries to a remote system of ours, and the errors have (mostly) stopped. I also notice some traces take an abnormal amount of time. Ping times to google.com are less than 20ms, but this trace shows reaching the root servers takes 104ms: # dig +trace +nodnssec google.com ; <<>> DiG 9.11.4-P1-RedHat-9.11.4-5.P1.fc28 <<>> +trace +nodnssec google.com ;; global options: +cmd . 3451 IN NS g.root-servers.net. . 3451 IN NS k.root-servers.net. . 3451 IN NS j.root-servers.net. . 3451 IN NS c.root-servers.net. . 3451 IN NS i.root-servers.net. . 3451 IN NS e.root-servers.net. . 3451 IN NS m.root-servers.net. . 3451 IN NS l.root-servers.net. . 3451 IN NS a.root-servers.net. . 3451 IN NS h.root-servers.net. . 3451 IN NS b.root-servers.net. . 3451 IN NS d.root-servers.net. . 3451 IN NS f.root-servers.net. ;; Received 839 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. ;; Received 835 bytes from 202.12.27.33#53(m.root-servers.net) in 104 ms google.com. 172800 IN NS ns2.google.com. google.com. 172800 IN NS ns1.google.com. google.com. 172800 IN NS ns3.google.com. google.com. 172800 IN NS ns4.google.com. ;; Received 287 bytes from 192.33.14.30#53(b.gtld-servers.net) in 44 ms ;; expected opt record in response google.com. 300 IN A 172.217.10.14 ;; Received 44 bytes from 216.239.36.10#53(ns3.google.com) in 29 ms Running the same trace again showed 129ms. I also located this warning: 06-Sep-2018 12:03:33.304 client: warning: client @0x7f502c1d3d50 127.0.0.1#60968 (cmail20.com.multi.surbl.org): recursive-clients soft limit exceeded (901/900/1000), aborting oldest query I've increased recursive-clients to 2500 but the SERVFAIL errors continue. There are also a ton of lame-server entries, many of which are related to one RBL or another, as part of my postscreen config: 06-Sep-2018 13:16:50.686 lame-servers: info: connection refused resolving '48.167.85.209.zz.countries.nerd.dk/A/IN': 195.182.36.121#53 06-Sep-2018 13:16:50.706 lame-servers: info: connection refused resolving '48.167.85.209.bb.barracudacentral.org/A/IN': 64.235.154.72#53 06-Sep-2018 13:16:51.308 lame-servers: info: connection refused resolving '48.167.85.209.bl.blocklist.de/A/IN': 185.21.103.31#53 06-Sep-2018 13:16:54.798 lame-servers: info: connection refused resolving 'e51dd24f684d212a7da1119b23603b0f.generic.ixhash.net/A/IN': 178.254.39.16#53 06-Sep-2018 13:16:54.799 lame-servers: info: connection refused resolving 'f4d997d8949e6dbd30f6a418ad364589.generic.ixhash.net/A/IN': 178.254.39.16#53 06-Sep-2018 13:16:55.762 lame-servers: info: connection refused resolving '2.164.177.209.bb.barracudacentral.org/A/IN': 64.235.145.15#53 06-Sep-2018 13:16:55.845 lame-servers: info: connection refused resolving '2.164.177.209.bb.barracudacentral.org/A/IN': 64.235.154.72#53 What would be a cause of such a significant delay in reaching the root servers? Thanks, Alex _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users