> | - The "action" routine would run through the hashes and 
> compute the average
> | spam levels for each IP, ...
> |...
> | I guess I need to sort out what a good criteria would be 
> for action. Would
> | average spam level be an adequate way to determine a "bad" IP? ...
> 
> Don't use 'average' on datasets that are not uniform or 
> gaussian in their
> nature, but can easily be skewed (e.g. a single whitelisted 
> score of -100
> will bump the average way out). Much better measure is the 
> median value
> (the middle element in the sorted list, you don't need to 
> actually sort it
> to get it).

Hmmm... I've done a little more work this morning, and I've got it to the
point now where all of the data is into a two-dimensional hash. They two
keys are the IP, and the date/time in seconds. This way I can age the data
out and only consider the most recent X emails.

I realize that the sending SMTP server is not always responsible for the
spam, but looking through this data, there are a TON of spams coming from a
smallish set of SMTP servers. It would really be good for my blood pressure
to be able to reject these at the gateway with:

544 Sod off, you wanker! You won't stop spamming us!

...rather than accepting the mail and filtering it out silently with
SpamAssassin.

Median sounds like a better idea than average, for sure. Perhaps to be
conservative both will have to be over a certain threshold.

I'm curious how you would find a median value without actually sorting the
list...

thanks!

johnS


-------------------------------------------------------
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community?  Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to