On Fri, 31 Jul 2009 07:53:00 -0500 "Dennis B. Hopp" <dh...@coreps.com> wrote:
> I cleared my maia statistics a couple of days ago. Since then > BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 > 1568 (all the other BAYES_XX are less then 1000 times). In those > same couple of days we have processed about 45,000 messages (this is > the number of messages that actually reached spamassasin and wasn't > out right rejected). 4510+2366+1568+1000 is a lot less than 45,000 > So my initial percentages were way off (I was > going by maia mailguards sa rule statistics). So roughly 10% of mail > is hitting BAYES_00 and 5% is hitting BAYES_99. It seems to me that > BAYES_99 should probably be triggered more often then BAYES_00. The ratio of BAYES_99 to BAYES_00 should mostly reflect the overall spam to ham ratio, it's not a figure of merit. Your percentages aren't consistent with with your numbers, over 70% of the Bayes results are at BAYES_99 or BAYES_00, which isn't all that bad. The main issue here is that your numbers don't add up, only about 1 in 10 of your 45,000 messages processed by spamassassin are accounted for in the BAYES statistics. > If there is a better way to get sa statistics I'd be happy to know. > > I know that the bayes success rate comes down to training, but like > every other administrator I can't possible check every message for > accuracy and I was hoping to make the auto learn a little better. I > thought maybe I just didn't have enough rules (both negative and > positive scoring) to trigger the auto learn often enough. With the the number of extra rules and plugins you have, you should have no trouble in autolearning all the spam you need, you might even want to increase the threshold from 8 to avoid misslearning.