Joe Flowers wrote on Tue, 12 Jul 2005 11:55:36 -0400: > >That's bad, really bad > >detection ... > > > > > > No. It's good, really good detection.
Sorry, I don't want to be rude by repeating myself, but if your average spam score is something like 6-something the *detection* *is* bad. Maybe not the end result but the pure spam detection. And that's also the reason why you had to try and find a method which lowers the threshold without giving you too much false positives. If your spam would score high enough you simply wouldn't need to do that. That's btw exactly what you said yourself: > > But anything you can do that widens the > > typical score distribution between ham and spam is a good thing. > > Amen!!!! > For lack of a better term in mind, I used "normalized". If the score of > a message is more than 30 points (or 25, I'm not going to waste time > looking back at the code) away from the nearest average, then I set the > score for the message back to 30 points away from the nearest average. Ok, I see you want to avoid peaks, makes sense. > It sounds like you have put in a lot of time to become an expert in the > traditional wisdom of SA and to tune it accordingly. Not more than others here. Not really too much time. And, I assume you > spend a lot of time keeping it tuned and dealing with SA upgrades. Not at all. I have once carefully crafted a combination of my own rules plus SARE rules some time ago, trained it a lot of spam and ham at first and now let it just run, SARE updates are done automatically by rulesdujour. I haven't put much attention to it for probably a year now. Just some upgrading to SA 3.1.* recently and maybe choosing a different SARE ruleset here and there. I'm > glad you have that time.... But, my situation is different and I agree > with some of the crtitics of SA - that it requires or almost requires an > expert to tune it properly and to keep it tuned properly You indeed need some time to understand how it all works together, but then you don't need to apply too much care anymore, really. Of course, you should stay up-to-date with releases and the rulesets you use, but that's not a daily business at all. > > And again, you are wrong. It is a very good setup (the proof is in the > pudding) As I said earlier if you look closer at the pudding I'm sure that your false positive rate is much higher than ours. Do you have a proven figure of your FP rate? To make it clear: I don't want to say that you have bad results from your setup. But I'm quite convinced that your FP rate could be much better if you tried to widen the gap between the ham and spam by applying more rules that are able to classify spam and maybe by finetuning a few rules scores (f.i. if you Bayes_99 is reliable you should boost it to 3 or 4, it's overly low in the 3.0 setups). Your peaks are only 8 score points away from each other. Ours are more than 20 points away from each other and the vale between them is really low. Which means even if I slide the threshold for one absolute score point from 5 to 6 or down to 4 I won't get a much different detection rate because there's so few messages scoring in that range. I'm sure that many on this list would have similar results if they did that. But I suppose you can't do that because your gap is simply too small. What happens if you move your threshold one score point down or up? Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com IE-Center: http://ie5.de & http://msie.winware.org