Re: Time for my monthly beating again...

Kevin Peuhkurinen 16 Feb 2005 14:50:39 -0000

Hey Joe. My 2.64 install is running so well, I almost don't want to upgrade to 3.0.2, and I really don't need to spend too much time on it to keep it that way. Perhaps you just need to devote a couple of days to do some tweaking and thereafter it should run well on its own. Finding out what works for others and taking the time to implement it would probably be a better use of your time than to attempt the experiment you are suggesting.

My current set up is: SA runs as a relay, checking email then passing it on to our Exchange server. I have the following extra rulesets taken from rulesemporium.com: 70_sare_adult.cf 70_sare_html0.cf 70_sare_bayes_poison_nxm.cf 70_sare_spoof.cf 70_sare_genlsubj0.cf 70_sare_html1.cf 72_sare_bml_post25x.cf 70_sare_header0.cf 70_sare_random.cf

I am also using DCC, Razor2, and SpamcopURI. My Bayes database is global with autolearning and I am not using AWL. I've tweaked the scores on some of the tests and disabled a few tests.

I have it set up such that anything that hits 3.5 or higher is consider spam. Anything that scores 8 or higher, which is the vast majority of spam (about 2000 emails per day) is kept on the SA server and a script automatically deletes any emails over 2 weeks old. Anything from 3.5 to 7 is sent to a special mailbox on my Exchange server. This is about 100-200 emails per day. I spend about five minutes each morning glancing through these emails looking for false positives, of which I see at most one per week. These are forwarded to the correct recipient and copies of them are placed into a special "false negative" folder. I also have a special "false positive" folder into which my users can drag and drop spam that gets through to them. On a typical day I see between five and ten emails put into there, which out of about 450 users is pretty darn good. I spend another five minutes each afternoon looking at these emails and making sure that they are in fact spam. A script on the SA server runs automatically each night and feeds the false negatives and positives through sa-learn. If I start to see a bunch of similar spams getting through, I'll spend an hour or two writing, testing, and deploying a rule to catch them. This happens about once a month.

So, all in all I generally spend 10-15 minutes per day looking after SA while achieving very satisfactory false positive and negative rates. The only reason I'm bothering to upgrade is that I'm starting to see some SA time outs as my production server is a 350mhz clunker. I got myself a brand new 3Ghz server to take over and figure that I ought to do the upgrade to 3.0.2 at the same time.

Joe Flowers wrote:

Later today I'll be implementing a "drifting" spam/ham dividing line (one "line" for the entire system - not individually set per email account) to see how effective it is or how effective it appears to be.

I'm curious to know if the dividing line will drift into a wall on some self-imposed boundary edge or if it will converge to a point for us or if it will slowly drift around in circles.

I'm "determining" the dividing line by taking the average of all of the SA hits of all of the messages and changing the dividing line, on the fly, for each subsequent message.

Anyone want to tell me or speculate on how this experiment will end or what it will tell me, whether I'm listening or not?

For us, SA *seems* to score SPAM messages with lower and lower hit scores as time goes by, and the users get more and more glassy-eyed over it's ("my" if you prefer) effectiveness as time goes by too.

I've spent a lot of time with the bayesian stuff and sa-learn, but still it seems to drift downward.

And, I have to agree that SA is very good but requires a lot of attention by someone who knows what they are doing - which, of course, may or may not be me.

Nonetheless, I have this problem before me and am attempting a possible solution.
Joe

Re: Time for my monthly beating again...

Reply via email to