Mark wrote:
A few weeks ago I thought of an interesting new "statistical" way of fighting spam. Having collected a few weeks worth of maillog data (about 500 MB), I wrote a small Perl script, matching IP addresses by the following simple rule (in words):
"Look at SMTP connections, and consider all IP addresses spam that, in one session, deliver to 4 or more local recipients simultaneously."
Naturally, I ran a post-process pass, so as to filter out white-list data (mailing lists and such). Based on the idea that, apart from mailing lists, there is hardly ever a decent reason for a remote host to deliver to more than 4 recipients simultaneously, the result was, indeed, astounding. :) Near 98% was pure spam.
So, I was wondering, regardless of this particular idea, have the SA developers ever seriously thought about doing maillog analysis? That may make things a bit platform/MTA dependent, but you really can learn a great deal about spammers based on "history" too, and not merely a per-message analysis.
I've pondered doing something similar, and figured I'd auto-generate a pair of RBLs from the data. My thought was that the input would be:
-- SA scores (keep track of the relay and the resulting SA score for each email)
-- Logfiles
We'd look for: -- Relays that suddenly start throwing "lots" of mail at the server. -- Relays that appear to be doing dictionary attacks. -- Relays that send "very spammy" email -- Relays that ignore SMTP errors, and keep banging away (malware)
I'd like to log enough data so that a quiescent system would be happy with seeing 1000's of messages per day from the SF mailing list servers, but that a host I've never seen before suddenly wanting to send 30 messages a second raises alarm bells. Similarly, if the average SA score from a given AOL mail relay is 20 and the average from a state university is -0.5, I'd like it to not get upset with a 22 from AOL, but get awfully unhappy with a 8 from the university.
Output would be 2 RBLS, for one, I'd return a temporary SMTP failure, and would time out the entry eventually (quickly enough so that legit mail gets through in an acceptable timeframe, long enough to slow a spammer significantly, also, watch for spammers that bang away stupidly). The second RBL would return permanent failures.
I got around to creating a SF project for it... problem is, all my design docs are handwritten on yellow legal pads.
--Rich
_________________________________________________________
Rich Puhek ETN Systems Inc. 2125 1st Ave East Hibbing MN 55746
tel: 218.262.1130 email: [EMAIL PROTECTED] _________________________________________________________
------------------------------------------------------- This SF.net email is sponsored by: eBay Get office equipment for less on eBay! http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk