The base rules are scored using a process that attempts to maximize spam hits while minimizing false positives. They are geared toward a user who has the default spam threshold of 5 points.
The percentages would probably vary significantly depending on who runs them and what types of spam/ham they see. I haven't run any numbers on this, but based on what I see with my system, I would guess that a score of 5 gives close to 99% likelihood of the message being spam. Of course, my system has several add-on rulesets, Razor2, DCC, Pyzor, and a well-trained Bayes database. I would say that if your SA is not running with at least 95% accuracy, you're doing something wrong. Bowie John Rudd wrote: > I can see how plugins and add-on rules all affect it, but certainly > they have some sort of base comparison that lets them know when > they've gotten the right score values for the base rules, right? > > > On Jul 26, 2006, at 3:22 AM, Sietse van Zanen wrote: > > I think such a thing would be very difficult. Because scoring is > > mostly dependant on your personal configuration of SA. The more > > plugins you use, the higher the score will be. And that is > > independant of spam probability. > > > > You might be able to compare bayes probabilities with SA scores, but > > automating it would be very, very difficult. > > > > From: John Rudd [mailto:[EMAIL PROTECTED] > > > Does anyone have a scale that compares the SA score to a "percent > > > likelihood that the message is spam"? > > > > > > > > > Something like "a score of 5 is a 75% chance than the message is > > > spam". But I don't want it just for a score of 5. What I'd like > > > is for scores of 1-10. And I'd also like to see it for percentage > > > likelihoods of 10, 20, 30, 40, 50, 60, 70, 75, 80, 85, 90, 95, 96, > > > 97, 98, and 99 (and maybe 100, but I expect that wont be > > > meaningful) (so, I can say "an 80% likelihood happens at a score of > > > 6" or something). > > > > > > It seems as though something like this must be done to keep the > > > right amount of the base spam/ham corpus used with the GA within > > > expected values. But I haven't ever seen an actual rating along > > > these lines. Hopefully it's not in a completely obvious place that > > > I've overlooked...