Bowie Bailey wrote:
> I was checking the relative usefulness of the per-user Bayes databases
> for my users and came up with the following confusing information.
> 
> When I look at the overall stats, bayes does pretty good:
> RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> ------------------------------------------------------------
>    6    BAYES_99    26754     4.19   44.49   67.00    3.06
> 
> But when I do it for only our domain (which is where all the manual
> training happens), it hits less ham, but less spam as well:
> RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> ------------------------------------------------------------
>    8    BAYES_99     4649     3.29   33.41   54.64    0.20
> 
> Just my personal email address (which is trained aggressively) gets
> very few ham hits (partly because I lowered my threshold to 4.0), but
> less spam than overall:
> RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> ------------------------------------------------------------
>    5    BAYES_99     1643     3.08   27.05   65.72    0.08
> 
> And then when I modify sa-stats to exclude our domain, I find that our
> customers (who are trained exclusively with autolearn) seem to do
> better than us:
> RANK    RULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> ------------------------------------------------------------
>    6    BAYES_99    22105     4.44   47.83   70.35    4.11
> 
> Based on these results, it almost seems like the more training Bayes
> gets, the worse it does!
> 
> Are these anomolies just an artifact of sa-stats relying on SA to
> judge ham and spam properly?  Can these numbers be trusted at all if
> my users don't reliably report false negatives and positives?

And as an additional data point, I found this for one of our internal
users who has never done any manual training:
RANK    RULE NAME     COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
   1    BAYES_99        373     6.76   78.20   95.64    0.00
   1    BAYES_00         73    20.51   15.30    0.00   83.91

-- 
Bowie

Reply via email to