I will walk back my previous comments and just say that bandwidth may be in 
play because anytime you soak a circuit it is not good.

Take a look at this query sequence:

dns.qry.type == 28 && dns.qry.name == concured.co

Packet 42356 shows a AAAA query for concurred.co.
Packets 42357/8 show 68.195.193.45 relaying the query to 62.138.132.21.
Packets 43015/16 show 62.138.132.21 replying with its query response to 
68.195.193.45.

And that's it.  Nothing is seen being sent back to 127.0.0.1.  At least on the 
wire.  By way of comparison, packet 161 shows 127.0.0.1 answering itself so I 
would consider the previous no response a clue.

Moving on:

Packet 48874 shows 127.0.0.1 asking for a AAAA record again.
This time we don’t see any external communication.
Packet 87174 shows 127.0.0.1 replying with server failure.

It took nearly 25 seconds to decide upon a SERVFAIL and that is another clue.

That said, there a heaps of queries where DNS worked as expected.  I really had 
to dig for the above examples because it seems like the vast majority of the 
server failure messages either do not get a reply on the localhost or we don’t 
see the routable adapter on the server attempting to reach out to get the 
answer.  concurred.co is unique in that we see that attempt to reach out and no 
attempt.

If the traffic that 127.0.0.1 is putting on the wire does not go out I am 
thinking firewall but you may be dealing with bandwidth exhaustion exclusively 
and it is presenting itself in this manner.  Or you may have a server 
configuration issues or a server that is under powered.

Sometimes pcap's are black and white and it gives you a "here is your problem" 
answer and other times it is like this where it does not give us anything 
conclusively to work with.  Since this sever is sputtering around I would set 
about first stabilizing traffic from 127.0.0.1 going out.  You need to see 
outbound traffic hit 127.0.0.1 then hit your external adapter without missing.  
Boom, boom, boom on down the line.

Hopefully others may have better more insightful suggestions.

Good hunting!

John

-----Original Message-----
From: Alex [mailto:mysqlstud...@gmail.com] 
Sent: Tuesday, September 11, 2018 1:57 PM
To: John W. Blue; bind-users@lists.isc.org
Subject: Re: Frequent timeout

Hi,

On Tue, Sep 11, 2018 at 2:47 PM John W. Blue <john.b...@rrcic.com> wrote:
>
> If you use wireshark to slice n dice the pcap .. "dns.flags.rcode == 2" shows 
> all of your SERVFAIL happens on localhost.
>
> If you switch to "dns.qry.name == storage.pardot.com" every single query is 
> localhost.
>
> Unless you have another NIC that you are sending traffic over this does not 
> look like a bandwidth issue at this particular point in time.

Thanks so much. I think I also may have confused things by suggesting it was 
related to bandwidth or utilization. I see it also happen now more regularly 
too.

Can you ascertain why it is reporting these SERVFAILs?

The queries are on localhost because /etc/resolv.conf lists localhost as the 
nameserver. Is that why we can't diagnose this? This most recent packet trace 
was started with "-i any". Why would the ones on localhost be the ones which 
are failing? I'm assuming postfix and/or some other process is querying bind on 
localhost to cause these errors?
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to