At 02:46 PM 12/7/03 -0600, Michael Satterwhite wrote:
50_scores.cf:score BAYES_80 0 0 5.300 2.862
50_scores.cf:score BAYES_90 0 0 4.027 3.002
50_scores.cf:score BAYES_99 0 0 5.200 3.008

if I'm reading this correctly more points are given for classifying a message
as 80% probable than for 90% probable - actually more is given for 80% than
for 99%. Looking at the spam messages I have, this seems to be true.

Can some of you experts explain this one to me?

Well, in set 2, bayes 80 scores higher, but in set 3 (with network checks) it scores lower.


However, Spamassasin scores are decidedly NOT a linear system. Due to interactions with hundreds of other rules, this kind of nonlinearity is very normal..

If you dig the archives I've posted about this quite a bit.

The basic gist is that no rule stands alone.. scoring is based on what combinations of rules fire off for a set of emails in the corpus. The goal is not to give the highest scores to the rule with the most spam, it's to give the scores that place the most emails in the right spam/nonspam piles. These often coincide, but not always, because the reality of real email is very complex.

It's very likely that a nonspam message that scores 90 in bayes (ie: a crude joke) will trigger lots of other rules and force the GA to score that rule lighter to avoid FPs.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to