On 10/14/2015 12:00 PM, Bill Cole wrote:
Describe, in detail, the new SA technology which fights abuse of new TLDs.

Prior to v3.4.1, the mechanism for detecting and parsing hostnames to identify body URIs used an embedded array of hardcoded domains in Mail/SpamAssassin/Util/RegistrarBoundaries.pm. This resulted in many URIs in the new TLDs not being detected and filtered as URIs. In v3.4.1 there is the new Mail/SpamAssassin/RegistryBoundaries.pm and the file 20_aux_tlds.cf in the canonical rules set which now contains a comprehensive maintained list of TLDs and other registry-managed domains.
A mention of why the list is even needed:

Most URLs are obvious and of the form "http://sub.domain.tld/blahblahblah"; and easy to detect. However, mail clients will also accept things like "sub.domain.tld/blahblahblah" without the protocol. We want to detect as many URLs as possible and ideally zero non-URLs, because each can turn into multiple DNS lookups. The list of TLDs gives us a way to eliminate obvious non-URLs, but it was designed when the worst we had to deal with was 100-ish ccTLDs that rarely changed. Nowadays it's easy for spammers to buy up garbage domains like example.bacon / example.click / example.industries, making an up to date list of TLDs much more important.

Reply via email to