On Fri, Dec 20, 2002 at 08:21:35PM -0500, Tom Allison wrote:
> I've heard differently from the bogofilter mailing list.  I 
> personally am starting to conclude that bayesian filtering is very 
> difficult to get working correctly.  It is too easy to skew the 
> statistics and therefore pass the filters.

Perhaps, but some (SA's specifically) works pretty damn well for me even
when only generically trained:

OVERALL%   SPAM% NONSPAM%     S/O    RANK   SCORE  NAME
  35910    13741    22169    0.383   0.00    0.00  (all messages)
100.000  38.2651  61.7349    0.383   0.00    0.00  (all messages as %)
 35.297   0.0291  57.1564    0.001   1.00   -2.00  BAYES_01
 21.041  54.9669   0.0135    1.000   1.00    2.00  BAYES_90
  9.788   0.0000  15.8555    0.000   0.94   -0.50  BAYES_10
  5.266  13.7545   0.0045    1.000   0.94    0.50  BAYES_80
  5.341  13.9437   0.0090    0.999   0.94    4.00  BAYES_99
  6.402   0.0146  10.3613    0.001   0.93   -0.10  BAYES_20
  2.999   7.8233   0.0090    0.999   0.93    0.10  BAYES_70
  0.744   0.0000   1.2044    0.000   0.92   -4.00  BAYES_00

The number after the BAYES_ name is "percentage likelihood of spam".
So this version correctly catches ~84.5% of my nonspam and ~90.4% of
my spam.  It would be better if I trained against any false positives
of course.

The results for system-wide would likely be less accurate but still
pretty good.

-- 
Randomly Generated Tagline:
"Yea, it's gone."               - Prof. Farr

Attachment: msg11250/pgp00000.pgp
Description: PGP signature

Reply via email to