On 21 Nov 2016, at 3:18, Matus UHLAR - fantomas wrote:

On 20.11.16 19:46, Alex wrote:
Am I reading this rule wrong, or does the presence of a .info domain
enough to warrant a 2.8 score?

* 2.1 URI_NO_WWW_INFO_CGI URI: CGI in .info TLD other than third-level "www"

<https://clientservices.ogletreedeakins.info/rs/vm.ashx?ct=3D24F76A1AD5E20A=
EDC1D180ACD125901ADFBE7BB3D38714D4CF371647BF8D90DDD78032>*

uri URI_NO_WWW_INFO_CGI
/^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]{7,}\.info\/(?=\S{15,})\S*\?/i

This particular email was scored at 5.30, and wouldn't have hit if it
didn't also hit SORBS, but such a score seemed quite high for just the
presence of a type of TLD.

it's not based only on .info tld:

1. TLD .info
2. no 'www'
3. third level domain
4. at least 6 characters 2nd-level domain

That's a 7 not a 6 :)

The RE says a bit more, and is maybe clearer using words:

http[s]://<hostname: not 'www'>.<domainname: 7 or more non-dots>.info/<15 or more non-whitespace characters including a literal ?>

Note that the trailing '\?' in the RE means a literal '?' indicating that the URI has a CGI-style query string. That makes this a very specific URI pattern. There's nothing "wrong" with such a URI except for the fact that objectively the frequency of that uncommon pattern is much higher in spam than non-spam.

I *suspect* that the pattern could be tightened a bit to reduce false positives without missing the spam that hits this rule, but I don't have any data to support that.

Reply via email to