Mark wrote:
A few weeks ago I thought of an interesting new "statistical" way of
fighting spam. Having collected a few weeks worth of maillog data (about 500
MB), I wrote a small Perl script, matching IP addresses by the following
simple rule (in words):

"Look at SMTP connections, and consider all IP addresses spam that, in one
session, deliver to 4 or more local recipients simultaneously."

Naturally, I ran a post-process pass, so as to filter out white-list data
(mailing lists and such). Based on the idea that, apart from mailing lists,
there is hardly ever a decent reason for a remote host to deliver to more
than 4 recipients simultaneously, the result was, indeed, astounding. :)
Near 98% was pure spam.

So, I was wondering, regardless of this particular idea, have the SA
developers ever seriously thought about doing maillog analysis? That may
make things a bit platform/MTA dependent, but you really can learn a great
deal about spammers based on "history" too, and not merely a per-message
analysis.


I've pondered doing something similar, and figured I'd auto-generate a pair of RBLs from the data. My thought was that the input would be:


-- SA scores (keep track of the relay and the resulting SA score for each email)
-- Logfiles


We'd look for:
 -- Relays that suddenly start throwing "lots" of mail at the server.
 -- Relays that appear to be doing dictionary attacks.
 -- Relays that send "very spammy" email
 -- Relays that ignore SMTP errors, and keep banging away (malware)

I'd like to log enough data so that a quiescent system would be happy with seeing 1000's of messages per day from the SF mailing list servers, but that a host I've never seen before suddenly wanting to send 30 messages a second raises alarm bells. Similarly, if the average SA score from a given AOL mail relay is 20 and the average from a state university is -0.5, I'd like it to not get upset with a 22 from AOL, but get awfully unhappy with a 8 from the university.

Output would be 2 RBLS, for one, I'd return a temporary SMTP failure, and would time out the entry eventually (quickly enough so that legit mail gets through in an acceptable timeframe, long enough to slow a spammer significantly, also, watch for spammers that bang away stupidly). The second RBL would return permanent failures.

I got around to creating a SF project for it... problem is, all my design docs are handwritten on yellow legal pads.

--Rich


_________________________________________________________


Rich Puhek
ETN Systems Inc.
2125 1st Ave East
Hibbing MN 55746

tel:   218.262.1130
email: [EMAIL PROTECTED]
_________________________________________________________



-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to