With a score as small as -0.036 the GA is giving more of a statement that 
this rule isn't much of an indicator of anything at all.

I'd agree with the original poster, almost anything coming out of the GA 
with a score between 0.05 and -0.05 is probably not worth running. 
Ultimately it is contributing less than 1% of the score needed to reach 
spam-tag levels (assuming the default of 5.0).

grepping 2 different 50_score.cf's out of CVS, SA has 416-632 rules. If a 
rule is contributing less than 1%, it would take ~1/6th of the total 
rulebase of similarly scored rules to create a spam-tag.  I think it's 
pretty obvious that such a low-scoring rule is statistically insignificant, 
particularly given the comparatively low number of total rules.

I believe if I created a rule matching the word "news", or other obviously 
poor indicator of spam, it too might get a similar GA score, which might be 
an interesting test to run :)

I think it would be great for SA to be able to "narrow in" scores within a 
threshold range to zero. Personally, I go through and hand-edit some of the 
near-zero entries to zero, since it's obvious to me that the rule in 
question is not worth 2-3 clock cycles (even if clock cycles are cheap, the 
rule in question has near zero value). I'd much rather add more rules which 
are good, strong indicators of spam/nonspam than have lots of rules which 
really don't correlate well.


At 07:28 AM 7/19/2002 +0200, Jesus Climent wrote:
>On Thu, Jul 18, 2002 at 05:57:11PM -0500, Shane Williams wrote:
> >
> > Also, if a single line of yelling scores -0.036, why not just round
> > it off to 0 and not have the test run at all?
>
>Because a single line of yelling seems to be a sign of a legitimate mail
>and thus is worth substracting some points to the tag?
>
>And since the scores are obtained using a GA, that's why the -0.036.
>
>J



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to