On 21 Nov 2016, at 3:18, Matus UHLAR - fantomas wrote:
On 20.11.16 19:46, Alex wrote:
Am I reading this rule wrong, or does the presence of a .info domain
enough to warrant a 2.8 score?
* 2.1 URI_NO_WWW_INFO_CGI URI: CGI in .info TLD other than
third-level "www"
<https://clientservices.ogletreedeakins.info/rs/vm.ashx?ct=3D24F76A1AD5E20A=
EDC1D180ACD125901ADFBE7BB3D38714D4CF371647BF8D90DDD78032>*
uri URI_NO_WWW_INFO_CGI
/^(?:https?:\/\/)?[^\/]+(?<!\/www)\.[^.]{7,}\.info\/(?=\S{15,})\S*\?/i
This particular email was scored at 5.30, and wouldn't have hit if it
didn't also hit SORBS, but such a score seemed quite high for just
the
presence of a type of TLD.
it's not based only on .info tld:
1. TLD .info
2. no 'www'
3. third level domain
4. at least 6 characters 2nd-level domain
That's a 7 not a 6 :)
The RE says a bit more, and is maybe clearer using words:
http[s]://<hostname: not 'www'>.<domainname: 7 or more
non-dots>.info/<15 or more non-whitespace characters including a literal
?>
Note that the trailing '\?' in the RE means a literal '?' indicating
that the URI has a CGI-style query string. That makes this a very
specific URI pattern. There's nothing "wrong" with such a URI except for
the fact that objectively the frequency of that uncommon pattern is much
higher in spam than non-spam.
I *suspect* that the pattern could be tightened a bit to reduce false
positives without missing the spam that hits this rule, but I don't have
any data to support that.