Re: New plugin: DecodeShortURLs

Warren Togami Jr. Wed, 05 Jan 2011 04:41:36 -0800

On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard <st...@stevefreegard.com>wrote:

>  On 01/01/11 11:51, Warren Togami Jr. wrote:
>
>  I'll help you start the process with a Bugzilla ticket.  I also hope you
> could get it into some sort of public source control mechanism soon so we
> can see the changes that go into it before inclusion in upstream.  I feel
> uncomfortable using something that is only available from a URL without
> being able to see its change history.
>
> Know how to use git?  github.com is pretty good for something small like
> this.
>
>
>
> Sure. No problem.
>

Setup a git repository?  I'd like to collaborate on development on this
plugin.

> 2) How widespread is URL shortening abuse now?  I can figure this out very
> easily by adding a non-network URI rule to the nightly masscheck.  Could you
> please send me privately your updated list of shorteners so that I may write
> such a rule?
>
>
> Based on the reports I get - quite prevalent at times and when these are
> used it's effectively a free-pass through the URIBL plug-in which often
> results in a false-negative.
>
> As soon as I've sorted out the list - I'll send it to you.
>

According to yesterday's masschecks, it appears that roughly 1% of spam and
1% of ham contains a URL shortener.  Of the spam in the corpus, ~49% of the
spam containing a URL shortener scoring 5 points or fewer.  A score this low
probably means they are successful in  avoiding positive URIBL hits.  If you
look at the borderline scores all the way up to 7, then you're looking at
64% of URL shortening spam.  Higher scores are almost always a sign that the
URL shortener domain itself is listed in a URIBL, probably because they
didn't police themselves and they were abused too much.  But the spam bias
of URL shorteners are definitely weighted heavily on the lower-end of
spamassassin scoring, meaning this is a worthwhile approach to develop.

The only trouble here is HTTP's TCP handshake and teardown is significantly
slower than DNSBL and URIBL lookups already used in spamassassin.  My
average scan time is less than one second.  A plugin that catches the 1% of
URL shortening spam is only worthwhile if it doesn't slow down your mail
scanning considerably.  Doing the HTTP query asynchronously would help, but
I fear that this could easily add several seconds per mail.

Warren

Re: New plugin: DecodeShortURLs

Reply via email to