On Sat, Jan 1, 2011 at 7:19 AM, Steve Freegard <st...@stevefreegard.com>wrote:
> On 01/01/11 11:51, Warren Togami Jr. wrote: > > I'll help you start the process with a Bugzilla ticket. I also hope you > could get it into some sort of public source control mechanism soon so we > can see the changes that go into it before inclusion in upstream. I feel > uncomfortable using something that is only available from a URL without > being able to see its change history. > > Know how to use git? github.com is pretty good for something small like > this. > > > > Sure. No problem. > Setup a git repository? I'd like to collaborate on development on this plugin. > 2) How widespread is URL shortening abuse now? I can figure this out very > easily by adding a non-network URI rule to the nightly masscheck. Could you > please send me privately your updated list of shorteners so that I may write > such a rule? > > > Based on the reports I get - quite prevalent at times and when these are > used it's effectively a free-pass through the URIBL plug-in which often > results in a false-negative. > > As soon as I've sorted out the list - I'll send it to you. > According to yesterday's masschecks, it appears that roughly 1% of spam and 1% of ham contains a URL shortener. Of the spam in the corpus, ~49% of the spam containing a URL shortener scoring 5 points or fewer. A score this low probably means they are successful in avoiding positive URIBL hits. If you look at the borderline scores all the way up to 7, then you're looking at 64% of URL shortening spam. Higher scores are almost always a sign that the URL shortener domain itself is listed in a URIBL, probably because they didn't police themselves and they were abused too much. But the spam bias of URL shorteners are definitely weighted heavily on the lower-end of spamassassin scoring, meaning this is a worthwhile approach to develop. The only trouble here is HTTP's TCP handshake and teardown is significantly slower than DNSBL and URIBL lookups already used in spamassassin. My average scan time is less than one second. A plugin that catches the 1% of URL shortening spam is only worthwhile if it doesn't slow down your mail scanning considerably. Doing the HTTP query asynchronously would help, but I fear that this could easily add several seconds per mail. Warren