> | - The "action" routine would run through the hashes and > compute the average > | spam levels for each IP, ... > |... > | I guess I need to sort out what a good criteria would be > for action. Would > | average spam level be an adequate way to determine a "bad" IP? ... > > Don't use 'average' on datasets that are not uniform or > gaussian in their > nature, but can easily be skewed (e.g. a single whitelisted > score of -100 > will bump the average way out). Much better measure is the > median value > (the middle element in the sorted list, you don't need to > actually sort it > to get it).
Hmmm... I've done a little more work this morning, and I've got it to the point now where all of the data is into a two-dimensional hash. They two keys are the IP, and the date/time in seconds. This way I can age the data out and only consider the most recent X emails. I realize that the sending SMTP server is not always responsible for the spam, but looking through this data, there are a TON of spams coming from a smallish set of SMTP servers. It would really be good for my blood pressure to be able to reject these at the gateway with: 544 Sod off, you wanker! You won't stop spamming us! ...rather than accepting the mail and filtering it out silently with SpamAssassin. Median sounds like a better idea than average, for sure. Perhaps to be conservative both will have to be over a certain threshold. I'm curious how you would find a median value without actually sorting the list... thanks! johnS ------------------------------------------------------- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk