On Dienstag, 9. Mai 2006 23:14 Bowie Bailey wrote: > When I look at the overall stats, bayes does pretty good: > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 6 BAYES_99 26754 4.19 44.49 67.00 3.06
3% HAM hits for BAYES_99 is horrible, not good. It's the FP that should make you alert. > But when I do it for only our domain (which is where all the manual > training happens), it hits less ham, but less spam as well: > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 8 BAYES_99 4649 3.29 33.41 54.64 0.20 At least much better FP rate, by a factor of 15! > Just my personal email address (which is trained aggressively) gets > very few ham hits (partly because I lowered my threshold to 4.0), but > less spam than overall: > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 5 BAYES_99 1643 3.08 27.05 65.72 0.08 Again the FPs reduced... > And then when I modify sa-stats to exclude our domain, I find that > our customers (who are trained exclusively with autolearn) seem to do > better than us: > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 6 BAYES_99 22105 4.44 47.83 70.35 4.11 No, 4% FPs is nothing you should be happy with. > Based on these results, it almost seems like the more training Bayes > gets, the worse it does! But remember that sa-stats can never tell if that HAM/SPAM are really such, it just tells you what it *believed* was HAM/SPAM. > Are these anomolies just an artifact of sa-stats relying on SA to > judge ham and spam properly? Can these numbers be trusted at all if > my users don't reliably report false negatives and positives? As I said on the other thread: Be very careful what you feed to bayes. Try to find those 4% of FPs, and if they are really FPs. Maybe your SA made the mistakes because you don't have enough rules to detect all SPAMs. mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE
pgpaSKVYsXRpj.pgp
Description: PGP signature