On 07/01/2016 10:13 AM, Groach wrote:
On 01/07/2016 09:56, Axb wrote:
I then informed him that SA alreadyhas a URL_SHORTENER checking rule
found
in 72_ACTIVE.CF. I was currently using this as a META rule thus:
meta MY_URI_URLSHORT __URL_SHORTENER # defined in 72_active.cf
ATM it seems there is no such rule - pls verify the name after running
sa-update
As quoted, it is " __URL_SHORTENER "
The entry reads as follows:
uri __URL_SHORTENER
/^http:\/\/(?:bit\.ly|tinyurl\.com|ow\.ly|is\.gd|tumblr\.com|formspring\.me|ff\.im|youtu\.be|tl\.gd|plurk\.com|migre\.me|j\.mp|cli\.gs|goo\.gl|yfrog\.com|lnk\.ms|su\.pr|fb\.me|alturl\.com|wp\.me|ping\.fm|chatter\.com|post\.ly|twurl\.nl|tiny\.cc|4sq\.com|ustre\.am|short\.to|u\.nu|flic\.kr|budurl\.com|digg\.com|twitvid\.com|gowal\.la|om\.ly|justin\.tv|icio\.us|p\.gs|loopt\.us|tcrn\.ch|xrl\.us|wpo\.st|bkite\.com)\/[^\/]{3}\/?/
ok - found it... and must say this rule is pretty sloppy and should
probably be deprecated. I hope whoever compiled this list takes a look
into this.
It includes domains which are clearly not URI shorteners, or never used
in spam, etc.
Imo, this rule can probably be deprecated in favour of network lookups
and is used in other META rules such as MONEY_FRAUD_5 (you see it is
preceeded with "__" )
URL shorteners aren't bad per se so it makes little sense to waste
cycles processing a long list which may or not be abused. Many of
these sites won't be around in 6 months, some have zero abuse some
may even be NXDOMAIN
You can see from 72_ACTIVE that the idea of using a url shortener isnt
bad by itself and that SA rules do use it in conjunction with other
'more likely' postive matching (such as MONEY_FRAUD_5)
Such rules are best mantained/provided by interested third parties
which may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load
the default rule set with extra bloat doesn't sound very wise.
Why not make this YOUR project?
Ok, well, I will leave it as HIS project ;-) (the guy who has already
applied his research to provided this surbl lookup). He also has stated
that many of these sites come and go (as you imply).
His project is to mantain a domain list, similar to Spamhaus DBL's
section "127.0.1.103 abused spammed redirector domain"
To mantain a SA rule with that data seems like a redundant effort but if
someone needs this in would be wiser to tackle it at source to avoid
stale data.