> -----Original Message----- > From: Robert Menschel > Sent: Tuesday, August 05, 2003 8:29 PM [...] > > Of those 1100 messages, how many were spam, and how many were ham? I > don't think I've seen more than a half dozen FPs in any *month*, much > less a day. > > GF> Generally, I'm using SA in local mode, and backing out to network > GF> mode only when local says no ham was found. > > So you're running SA against your rule set and Bayes without DNSBL > checks, and then if these do not scream SPAM (high score) or HAM > (negative score), you then check DNSBL to see if they will give a spam > score? > > GF> Given my ham to spam ratio (roughly 1 to 5) that's been okay, but it > GF> probably leads to a surprising result where spam is over-aggressively > GF> mis-classified. I'm using 2.60 cvs (6/30) at the moment, but I think > GF> the same problem would come up on version 2.55. > > Very possibly not -- 2.60 doesn't yet have statistically determined > rules; the rule set is more advanced than 2.55, and to my knowledge > hasn't yet been run against the giant SA corpus available to the > developers. After that process the rule score defaults are adjusted to > minimize FPs. Again to my knowledge, that FP minimization step hasn't yet > taken place for 2.60 > > GF> The problem is that I'm seeing these misclassified spams as having > GF> only, or nearly only, BAYES_99 asserted. ... > > I don't remember ever seeing BAYES_99 on anything that wasn't spam, > and I've only seen BAYES_90 on non-spam once in three months. That leads > me to question the accuracy of your original corpus. How was it built > and classified? What are the chances that persons A and B classified > emails as spam, and Bayes learned it as spam, while persons C and D claim > these are not spam? > > GF> Using BAYES_99 as an example, it will be scored 5.2 with Bayes > GF> enabled, while running in local (non-network) mode, and only 3.008 > GF> when networking is enabled. Trouble is, that 5.2 exceeds the default > GF> cut off of 5. ... > > GF> What I'm working up to here: For those of you using Bayes, did you > GF> also move your threshold value up (to say, 7 or above), or do you > GF> simply tolerate more false positives? (I'd have to say that the > GF> four/five false positives I'm now seeing per day, and didn't see > GF> before is too high a number for my tastes). > > I rely heavily on Bayes. I run with a required hits of 9.0, and I run > with BAYES_99 set at 9.0, and with BAYES_90 set at 7.5 (83% of > threshold). I think I got one FP in all of July, and it had a low Bayes > score. > > So in summary, no, I don't think your Bayes *scores* are the problem. I > think the main problem is that Bayes learned ham as spam. I would suggest > checking through your spam corpus and relearning any misclassified emails > as ham. >
Hello Bob, I ran sa-learn last night on two large spam and ham mboxes that I'd been collecting. To reduce interactions between auto-whitelisting and Bayse, I removed my auto white lists, thus I'm depending strongly on SA's regular scoring and Bayes scoring. I saw no false positives out of a 200/so spam messages, so as you say the issue must have been that auto-learn wasn't feeding Bayes a balanced diet of ham and spam, and thus having trouble telling ham from spam. I also just call SA with full network checks now, and don't try just local checks first. This results in Bayes_99 being scored in the 3 range rathen than over 5. > A second and less critical problem may be your use of 2.60 and its not > yet statistically validated scores. This will remain less important as > long as you have ham with Bayes scores 90% and over. This could be the case, but I haven't noted many/any ill-effects. I tweak a few of SA's scores in my local.cf file any way. Thank you for your support, <g> - Gary ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk