On Fri, Dec 20, 2002 at 08:21:35PM -0500, Tom Allison wrote: > I've heard differently from the bogofilter mailing list. I > personally am starting to conclude that bayesian filtering is very > difficult to get working correctly. It is too easy to skew the > statistics and therefore pass the filters.
Perhaps, but some (SA's specifically) works pretty damn well for me even when only generically trained: OVERALL% SPAM% NONSPAM% S/O RANK SCORE NAME 35910 13741 22169 0.383 0.00 0.00 (all messages) 100.000 38.2651 61.7349 0.383 0.00 0.00 (all messages as %) 35.297 0.0291 57.1564 0.001 1.00 -2.00 BAYES_01 21.041 54.9669 0.0135 1.000 1.00 2.00 BAYES_90 9.788 0.0000 15.8555 0.000 0.94 -0.50 BAYES_10 5.266 13.7545 0.0045 1.000 0.94 0.50 BAYES_80 5.341 13.9437 0.0090 0.999 0.94 4.00 BAYES_99 6.402 0.0146 10.3613 0.001 0.93 -0.10 BAYES_20 2.999 7.8233 0.0090 0.999 0.93 0.10 BAYES_70 0.744 0.0000 1.2044 0.000 0.92 -4.00 BAYES_00 The number after the BAYES_ name is "percentage likelihood of spam". So this version correctly catches ~84.5% of my nonspam and ~90.4% of my spam. It would be better if I trained against any false positives of course. The results for system-wide would likely be less accurate but still pretty good. -- Randomly Generated Tagline: "Yea, it's gone." - Prof. Farr
msg11250/pgp00000.pgp
Description: PGP signature