Hello,

We have a cluster of Bind9 resolvers behind load balancers (for historical 
reasons, mainly that we can't force people to use multiple resolver IP 
addresses in their configurations(static) and everything still has to work).

The load balancers do health checks to determine whether or not the hosts are 
responding to queries and then based the result of those checks the individual 
hosts are rotated in and out of operation.

We noticed that some of these health checks are failing (seemingly at random) 
and hosts are flapping in and out of the SLB pool, but we cannot actually 
figure out why those queries are failing.

43/1656 queries resulted in DNS mesg recv: no answ section

Our environment is EL7 running BIND 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.9

Checking standard logging channels the only real error we see from named is 
this:

"named[5821]: dispatch 0x7f70e400fad0: shutting down due to TCP receive error: 
(seemingly random IP address) connection reset" but the source IP that the 
health checks come from don't appear anywhere in the logs.

We read through this document 
https://kb.isc.org/docs/monitoring-recommendations-for-bind-9 which gave us 
some good ideas on things to look at but sadly there doesn't appear to be 
anything sticking out at us as a real cause.

If anyone has any thoughts on this I would be really grateful.

Thanks,
-Drew

















-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to