The base rules are scored using a process that attempts to maximize
spam hits while minimizing false positives.  They are geared toward a
user who has the default spam threshold of 5 points.

The percentages would probably vary significantly depending on who
runs them and what types of spam/ham they see.

I haven't run any numbers on this, but based on what I see with my
system, I would guess that a score of 5 gives close to 99% likelihood
of the message being spam.  Of course, my system has several add-on
rulesets, Razor2, DCC, Pyzor, and a well-trained Bayes database.

I would say that if your SA is not running with at least 95% accuracy,
you're doing something wrong.

Bowie

John Rudd wrote:
> I can see how plugins and add-on rules all affect it, but certainly
> they have some sort of base comparison that lets them know when
> they've gotten the right score values for the base rules, right?
> 
> 
> On Jul 26, 2006, at 3:22 AM, Sietse van Zanen wrote:
> > I think such a thing would be very difficult. Because scoring is
> > mostly dependant on your personal configuration of SA. The more
> > plugins you use, the higher the score will be. And that is
> > independant of spam probability. 
> > 
> > You might be able to compare bayes probabilities with SA scores, but
> > automating it would be very, very difficult.
> > 
> > From: John Rudd [mailto:[EMAIL PROTECTED]
> > > Does anyone have a scale that compares the SA score to a "percent
> > > likelihood that the message is spam"?
> > > 
> > > 
> > > Something like "a score of 5 is a 75% chance than the message is
> > >   spam". But I don't want it just for a score of 5.  What I'd like
> > > is for scores of 1-10.  And I'd also like to see it for percentage
> > > likelihoods of 10, 20, 30, 40, 50, 60, 70, 75, 80, 85, 90, 95, 96,
> > > 97, 98, and 99 (and maybe 100, but I expect that wont be
> > > meaningful) (so, I can say "an 80% likelihood happens at a score of
> > > 6" or something). 
> > > 
> > > It seems as though something like this must be done to keep the
> > > right amount of the base spam/ham corpus used with the GA within
> > > expected values.  But I haven't ever seen an actual rating along
> > > these lines. Hopefully it's not in a completely obvious place that
> > > I've overlooked...

Reply via email to