(Answering on the SA Dev list, but Cc: to SA users since this list was also involved. I’d appreciate follow-ups on the SA dev list - Reply-To: set.)
> I can suggest that we run a statistical experiment by turning all non-.255 > responses into .255 responses and then compare the rate of queries. Things to keep in mind about the following data: * The query sources and the query content are disassociated as the first step in gathering the data to ensure privacy. So we do not really know *who* is querying *what*. * As a consequence, we can observe the „who is querying what“ only by looking at the data of a particular mirror for the list.dnswl.org <http://list.dnswl.org/> zone at the moment the data is gathered until the log aggregation kicks in, but not later and not aggregated. * Since we can only observe DNS traffic, and given the caching (especially with the relatively long TTLs used in this zone), this is only a proxy variable for actual mail traffic. Due to caching we overestimate the low usage and underestimate the high usage patterns (assuming that they profit more from caching). * We throw away some log data to limit resource use, so the data we have in our database generally slightly underrepresents the actual numbers. Some statistics on overall usage (all numnbers rounded to avoid the impression of overly exact numbers): * 332’000 sources querying list.dnswl.org <http://list.dnswl.org/> zone in the past 30 days * of those, 13’100 sources have been doing more than 30 * 100’000 queries (ie, "consistent overusers“, and not just those who have a spike once in a while) * 273 * 10^9 queries over the past 30 days overall * Of these, ca 75% of the queries (200 * 10^9) have been issued by the 13’100 „consistent overusers“ A lot of overusers are using more than one source IP (and some like Google use *a lot* of IPs, both IPv4 and IPv6). A lot of the IPs completely lack PTR records, or are using them inconsistently. However we can roughly group the overuse: * Large resolvers, both public and hoster-provided, namely Google, OpenDNS, Proxad, Cloudflare, OVH and similar. * Individual organisations where it looks unlikely that the data is used for filtering purposes (outbound servers from Sendgrid with millions of queries per day?!) * Commercial vendors of e-mail (filter) services We can guesstimate that the 13’100 sources equal to about 1’000 to 3’000 overusing organisations in the second and third group. I’d call them „conscious overusers“, since they should have an understanding of what they are doing (however given the lack of action against any of the block results, the „should have“ in the previous sentence is a bold statement). I expect a good number of „unconsious overusers“ behind the large resolvers (eg a typical Spamassassin admin with misguided DNS setup), but there are likely also „conscious overusers“ trying to blend into that group. The number of organisations can hardly be estimated with meaningful accuracy. We have ca 1’900 IP (ranges) with some form of block (we call this the „mirror ACL“): aclaction count refuse 5 returnhi 430 parentblock 1417 If we only look at those which have „hits“ within the past 30 days: aclaction count returnhi 229 parentblock 180 „refuse“ is the _BLOCKED result; „returnhi“ the 127.0.10.3, „parentblock“ is hiding the NS for list.dnswl.org <http://list.dnswl.org/> (which would typically result in a SERVFAIL or NXDOMAIN for the NS records). There are also some exceptions which are not shown here (they are rare, and seem not to be actively used any more). Since we only store postiive results (ie those that did result in some form of response from our DNS mirrors) and not the results themselves, we can not tell the percentage of responses in refused / returnhi / parentblock (and a successful parentblock would not even make it into the logs). All returnhi / parentblock have now been reverted to refuse. It will take several hours for this to be fully propagated (export / sync delay, and especially TTLs). We also attempted to identify some of the categories large resolvers / individual abusers and to add them to the „refuse“ acl action in order to have a more consistent experience. We will let it run for about a week with all aclactions on „refuse“, and review the data. Since there is quite some natural fluctuation in the logs (throughout the days, over the week, and seasonally), it may need more than one week to get meaningful data. —Matthias, for the dnswl.org <http://dnswl.org/> project