(Answering on the SA Dev list, but Cc: to SA users since this list was also 
involved. I’d appreciate follow-ups on the SA dev list - Reply-To: set.)

> I can suggest that we run a statistical experiment by turning all non-.255 
> responses into .255 responses and then compare the rate of queries.

Things to keep in mind about the following data:

* The query sources and the query content are disassociated as the first step 
in gathering the data to ensure privacy. So we do not really know *who* is 
querying *what*.
* As a consequence, we can observe the „who is querying what“ only by looking 
at the data of a particular mirror for the list.dnswl.org 
<http://list.dnswl.org/> zone at the moment the data is gathered until the log 
aggregation kicks in, but not later and not aggregated.
* Since we can only observe DNS traffic, and given the caching (especially with 
the relatively long TTLs used in this zone), this is only a proxy variable for 
actual mail traffic. Due to caching we overestimate the low usage and 
underestimate the high usage patterns (assuming that they profit more from 
caching).
* We throw away some log data to limit resource use, so the data we have in our 
database generally slightly underrepresents the actual numbers.


Some statistics on overall usage (all numnbers rounded to avoid the impression 
of overly exact numbers):

* 332’000 sources querying list.dnswl.org <http://list.dnswl.org/> zone in the 
past 30 days
* of those, 13’100 sources have been doing more than 30 * 100’000 queries (ie, 
"consistent overusers“, and not just those who have a spike once in a while)
* 273 * 10^9 queries over the past 30 days overall
* Of these, ca 75% of the queries (200 * 10^9) have been issued by the 13’100 
„consistent overusers“

A lot of overusers are using more than one source IP (and some like Google use 
*a lot* of IPs, both IPv4 and IPv6). A lot of the IPs completely lack PTR 
records, or are using them inconsistently. However we can roughly group the 
overuse:

* Large resolvers, both public and hoster-provided, namely Google, OpenDNS, 
Proxad, Cloudflare, OVH and similar.
* Individual organisations where it looks unlikely that the data is used for 
filtering purposes (outbound servers from Sendgrid with millions of queries per 
day?!) 
* Commercial vendors of e-mail (filter) services

We can guesstimate that the 13’100 sources equal to about 1’000 to 3’000 
overusing organisations in the second and third group. I’d call them „conscious 
overusers“, since they should have an understanding of what they are doing 
(however given the lack of action against any of the block results, the „should 
have“ in the previous sentence is a bold statement).

I expect a good number of „unconsious overusers“ behind the large resolvers (eg 
a typical Spamassassin admin with misguided DNS setup), but there are likely 
also „conscious overusers“ trying to blend into that group. The number of 
organisations can hardly be estimated with meaningful accuracy.

We have ca 1’900 IP (ranges) with some form of block (we call this the „mirror 
ACL“):

aclaction       count
refuse  5 
returnhi        430
parentblock     1417

If we only look at those which have „hits“ within the past 30 days:

aclaction       count
returnhi        229
parentblock     180

„refuse“ is the _BLOCKED result; „returnhi“ the 127.0.10.3, „parentblock“ is 
hiding the NS for list.dnswl.org <http://list.dnswl.org/> (which would 
typically result in a SERVFAIL or NXDOMAIN for the NS records). There are also 
some exceptions which are not shown here (they are rare, and seem not to be 
actively used any more).

Since we only store postiive results (ie those that did result in some form of 
response from our DNS mirrors) and not the results themselves, we can not tell 
the percentage of responses in refused / returnhi / parentblock (and a 
successful parentblock would not even make it into the logs).

All returnhi / parentblock have now been reverted to refuse. It will take 
several hours for this to be fully propagated (export / sync delay, and 
especially TTLs). We also attempted to identify some of the categories large 
resolvers / individual abusers and to add them to the „refuse“ acl action in 
order to have a more consistent experience.

We will let it run for about a week with all aclactions on „refuse“, and review 
the data. Since there is quite some natural fluctuation in the logs (throughout 
the days, over the week, and seasonally), it may need more than one week to get 
meaningful data.

—Matthias, for the dnswl.org <http://dnswl.org/> project




Reply via email to