Yes, the large rule scores probably do make the system more sensitive to minor 
variations in input.  However, they also apparently lead to more accurate 
scores.  It is interesting that even running unconstrained over 50,000 
generations of scores, no score ended up larger than about 20, and that one is 
one which clearly is a spam-marker (PORN_1), and only a handful scored more than 
+/-10.  You guys are right though that we don't want to accidentally overfit by 
assigning huge scores.

C

Bart Schaefer wrote:

> Date: Wed, 27 Feb 2002 16:28:07 -0800 (PST)
> From: Bart Schaefer <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [SAtalk] Troubling new scores in 2.1 release
> 
> On Wed, 27 Feb 2002, Craig R Hughes wrote:
> 
> > This isn't really a problem.  It can actually be helpful too to allow
> > the GA to do its own thing [...]
> 
> On Wed, 27 Feb 2002, Tom Lipkis wrote:
> 
> > With large scores like this (positive or negative), very small
> > perturbations in input can cause wildly different results, which seems
> > undesirable.
> 
> What Tom said.
> 
> I'm worried about innocent messages that accidentally trip a single large
> positive rule, or spams that accidentally trip a single large negative 
> one, not about messages that trip a lot of rules in both directions.
> 
> 
> 


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to