> From: Jim Reid <j...@rfc1035.com> > My logs tended to have a few hundred entries at a time for the same > (spoofed?) IP address. So as soon as I blackholed the last IP address > in the log file, entries for another would be appended. At 4am and > there's a caffeine deficit, this looks like a new client has > immediately popped up to replace the one that's just been nuked. In > fact, the "new" IP address was already there and its queries were lost > amongst the noise of the other 100+ addresses that were firing crap at > the name server.
That raises two issues. A problem with the response rate limiting code I've been working is logging. One needs to be able to find out response have been rate limited and why. To answer that question, my current logging code simply logs to a new BIND9 category whenever it drops a response (or would have dropped when in test mode). The problem is that even on my small DNS servers that generates too much noise. My plan is to change from instantaneous to retrospective logging that says something equivalent to "10.2.3.4 recently asked 27 times for A records for example.com and the last 13 responses were dropped." The second issue concerns log noise and the popular enthusiasm for using Bloom filters for DNS response rate limiting. I've heard more than one suggestion for using Bloom filters for DNS response rate limiting. Bloom filters are a great idea for some things but I think they a problem instead of a solution here. The problem is suggested by the word "probabilistic" in "Part of a series on Probabilistic data structures" on https://en.wikipedia.org/wiki/Bloom_filter It's like the difference between accounting and statistics. You don't (and for privacy reasons must not) care exactly how many nearby households have incomes above or below twice the median for your neighborhood. A statistical statement like 99.9% of your neighbors earned $31,000 +/-$10,000 is fine. On the other, accounting hand, you'd be unhappy if your bank told you that 0.1% of your bank statements would be fiction, and you'd have to guess which. Bloom filters have false positives. If you know enough about your data, you can make the false positive probability as low as you like, but you cannot make that probability zero without giving up the reasons why you chose a Bloom filter. Never mind the difficulities in knowing enough about your DNS query stream, and not that it is always a probability distribution as opposed to a rate. Computing that distribution depends on hard to answer questions such as how independent your hash functions really are on your real data. The connection with logging is that you need to be able to answer the question "Why did your DNS server drop my requests?" With any sort of probabilistic filter including Bloom filters, you won't be able to say "You sent more than X requests" without turning on query-logging and slogging through GBytes of log lines. I think doing the retrospective logging I plan would make any Bloom filter scheme equivalent to a straight forward hash table. Log messages saying "IP addresses that the filter says are the same recently asked 27 times for A records for example.com and the last 13 responses were dropped" would not satisfy people wanting to know why their customer's browsers are stalling when trying to get to their web sites. Vernon Schryver v...@rhyolite.com _______________________________________________ dns-operations mailing list dns-operations@lists.dns-oarc.net https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs