I was checking the relative usefulness of the per-user Bayes databases
for my users and came up with the following confusing information.

When I look at the overall stats, bayes does pretty good:
RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
   6    BAYES_99    26754     4.19   44.49   67.00    3.06

But when I do it for only our domain (which is where all the manual
training happens), it hits less ham, but less spam as well:
RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
   8    BAYES_99     4649     3.29   33.41   54.64    0.20

Just my personal email address (which is trained aggressively) gets
very few ham hits (partly because I lowered my threshold to 4.0), but
less spam than overall:
RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
   5    BAYES_99     1643     3.08   27.05   65.72    0.08

And then when I modify sa-stats to exclude our domain, I find that our
customers (who are trained exclusively with autolearn) seem to do
better than us:
RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
   6    BAYES_99    22105     4.44   47.83   70.35    4.11

Based on these results, it almost seems like the more training Bayes
gets, the worse it does!

Are these anomolies just an artifact of sa-stats relying on SA to
judge ham and spam properly?  Can these numbers be trusted at all if
my users don't reliably report false negatives and positives?

-- 
Bowie

Reply via email to