Hello, I've been running SA with Bayes enabled only the past few days. Bayes has been auto-learned on two rather large corpuses, which yielded about 1100 auto-learn messages (per the Bayes journal file). I've noticed the number of false negatives (ie, spam mis-classified as ham) have dropped to almost zero, but I'm seeing maybe half a dozen false positives (ham mis-classified as spam) per day. I'm having to white list friends and newsletters that previously went through just fine.
Generally, I'm using SA in local mode, and backing out to network mode only when local says no ham was found. Given my ham to spam ratio (roughly 1 to 5) that's been okay, but it probably leads to a surprising result where spam is over-aggressively mis-classified. I'm using 2.60 cvs (6/30) at the moment, but I think the same problem would come up on version 2.55. The problem is that I'm seeing these misclassified spams as having only, or nearly only, BAYES_99 asserted. The various BAYES rules are scored as follows: score BAYES_00 0 0 -5.300 -5.200 score BAYES_01 0 0 -5.400 -5.400 score BAYES_10 0 0 -5.300 -4.701 score BAYES_20 0 0 -4.701 -2.601 score BAYES_30 0 0 -1.070 -0.927 score BAYES_40 0 0 -0.001 -0.001 score BAYES_44 0 0 -0.001 -0.001 score BAYES_50 0 0 0.001 0.001 score BAYES_56 0 0 0.001 0.001 score BAYES_60 0 0 1.997 1.101 score BAYES_70 0 0 2.593 2.310 score BAYES_80 0 0 5.300 2.862 score BAYES_90 0 0 4.027 3.002 score BAYES_99 0 0 5.200 3.008 Using BAYES_99 as an example, it will be scored 5.2 with Bayes enabled, while running in local (non-network) mode, and only 3.008 when networking is enabled. Trouble is, that 5.2 exceeds the default cut off of 5. So, only if a large auto-whitelist value, or some other negative score kicked in would this message escape being mis-classified as spam. The 3.008 network value might be nearer the mark, a very high weighting, but one that would require some other tests to kick in before the message is classified as spam. What I'm working up to here: For those of you using Bayes, did you also move your threshold value up (to say, 7 or above), or do you simply tolerate more false positives? (I'd have to say that the four/five false positives I'm now seeing per day, and didn't see before is too high a number for my tastes). ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk