On Thursday 09 November 2006 22:14, Steve Ingraham took the opportunity to say: > Ok, I have a question on these Bayes rules related to false positives. > It appears that many of my users are having legitimate emails scored in > the 8 to 9 range. These emails are scoring high basically because they > are hitting on one of the various Bayes rule (most notably the > Bayes_50_Body and the Bayes_95_Body rules). Is there something > straightforward that can be done to stop these legitimate scores from > scoring high on the Bayes rules? > > I have already decreased the Bayes_50_Body rule from 5.0 to 2.5. I > don't want to decrease the scores with every Bayes rule because I think > I will start seeing some true spam delivered because it did not score > high. > > Any ideas?
1) False negatives are better than false positives (up to a certain limit at least). 2) BAYES_50 means that the classifier has no idea whether it's spam or not. It should definitely not be scored at 5.0, and 2.5 is probably way too high, but it depends on what other rules your ham trigger. The important thing is that the total for a ham message doesn't go over 5 (or whatever limit you choose). If almost all ham hits BAYES_00 or the occasional BAYES_05, then in principle there is nothing wrong with a relatively high BAYES_50 score (1.0, for example). ____________________ In 2) above you are telling me that 5.0 and even 2.5 is way too high. So what should it be? Again I do not understand the string of numbers that were displayed in the Bayes scores that Daryl included in his message. What about Bayes_95? My big problem is that I am having quite a few legitimate emails scoring high because of Bayes_50 or Bayes_95 scores. What can be done to get these legitimate emails to not be scored on these Bayes rules? Should I be decreasing these two score thresholds? If so, what should they be set at? If I should not be altering these Bayes scores then what else can I do to keep legitimate emails from hitting on these rules? All advice is appreciated. Steve
