Hello,

It may or may not be relevant, but it sounds similar to a problem we had to solve a few months ago. Try the following query analysis - monitor the number of recursive queries in a given moment, and when it exceeds a certain threshold, send "rndc recursing" to Bind and have a look on the queries. Basically, we have find out there is and ongoing attack originating from China that has the following structure - a number of bogus domains is registrered, like "345qp.com.cn", etc, then target nameservers are listed as authoritative for it, and vast botnets of infected home routers/modems are told to send bogus queries for the domain. Your resolvers will start having problems you describe when the admin of the attacked authoritative servers realizes what's going on and stops responding to queries to these domains. That means your resolvers have to wait for timeout of each and everyone of these bogus queries which in the meantime blocks an amount of memory and processing time, and it adds up rather quickly, potentially overwhelming your hardware (basically, it's a huge abnormal peak contrasting with normal operation)

The solution we chose is that we identify these bogus queries (they vastly outnumber legitimate queries), and we decide to sort of "blacklist" the given bogus domain for an amount of time in the sense that we no longer do a recursive query for the client, but we immediately and authoritatively answer NXDOMAIN for the query. While it is a deviation from the correct behavior, it conservers the resources of the resolver, because an immediate authoritative answer takes fraction of time, memory and cpu to resolve. False positives are of course possible, but with some degree of monitoring and whitelisting problematic domains (like google.com, yahoo.com, etc.), they can be rather rare.

Hope this helps, don't hesitate to ask me for details. I think it maybe relevant to your situation.

--
Best Regards,
Daniel Ryšlink
System Administrator

Dial Telecom a. s.
Křižíkova 36a/237
186 00 Praha 3, Česká Republika
Tel.:+420.226204627
daniel.rysl...@dialtelecom.cz
-----------------------------------------------
www.dialtelecom.cz
Dial Telecom, a.s.
Jednoduše se připojte
-----------------------------------------------

On 11/24/2014 12:37 PM, Niall O'Reilly wrote:
At Sun, 23 Nov 2014 21:00:15 -0800 (PST),
blrmaani wrote:
Our nameservers take upto 10KQPS (mostly NOERROR type most of the time).

Twice or thrice a week, I have seen upto 10% of the queries are
SERVFAIL and we have started exceeding the default value of 2000 for
recursive-clients settings in BIND 9.9.x.

Is there a recommended value for recursive-clients option assuming
huge number of SERVFAIL queries once in a 2/3 days?

I'm not convinced to increase it to some arbitrary huge number
20,000 or 200,000.

I am looking for answer like - if your peak SERVFAIL queries are
2000/second, then your recursive-clients value should be N.
   I wouldn't expect that such an answer could make sense.

   Exhaustion of the active recursive-clients list and the generation
   of responses marked SERVFAIL are most likely different symptoms of
   the same problem.  I think you'll need to identify this problem and
   then determine what action to take.

   Your resolver seems to be dealing with queries which are
   unanswerable and which are arriving in a quantity sufficient to fill
   the recursive-clients list.  This may be due to rogue clients,
   misconfigured authoritative servers, network problems, or some
   combination of these.  Your logs will help identify which.

   I hope this helps.

   Niall O'Reilly
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to