On 3/11/02 4:53 AM, "Michael Moncur" <[EMAIL PROTECTED]> wrote:

> Matt Sergeant wrote:
>> I would suggest that we be extremely careful about checks that get given
>> a score over 5. Part of the beauty of SpamAssassin (and heuristics in
>> general) is that usually a hit just contributes to the overall score, but
>> doesn't necessarily tip things over. Having said that, some things almost
>> inevitably should be over 5, like ratware.
> 
> I agree - with too many scores over 5 SA doesn't have much advantage over a
> simple set of procmail filters. I think the GA scores should be reduced (or
> limited?) next time - I'm thinking of limiting mine so that nothing scores
> over
> 3.0. That way a message would need to meet, at minimum, two strong spam
> criteria in order to be flagged as spam.

The new scores are in fact limited during evolution to -4..4 +/- (gaussian
noise of mean 1)

In the next iteration I might reduce this to -3..3 +/- noise

The fact that the mortgage rate rule scored so high is partly random chance
(it got a tail on the gaussian distribution), and partly that for the corpus
it is an excellent indicator.

I've been doing some reading since 2.11 went out on Stein's Paradox, so I
most likely will actually make some other changes to the way the scores are
calculated.  What I really need first though is to read a discussion of how
Stein's Paradox applies to GA evolution -- not sure if the GA will
"automatically" take the multiple-estimate stuff into account.  I imagine it
probably will, but if anyone has a reference for me, I'd appreciate it.

> After the GA runs someone could override scores that are absolute spam
> indicators - i.e. ratware - with a higher score.

I did tweak a couple of the scores after the GA ran -- didn't tweak the
mortgage rate one, mostly because it was still close to 5, but also because
I wasn't thinking about home buyers, realtors, etc. at the time.  How ever,
I'm not sure that the right thing to do isn't leave in a fairly high-scoring
rule score there, and if you're sysadmin for a realtor, you should disable
that rule...

> For now I've just set my required_hits to 7.0, which seems to work pretty well
> with the current scores.



_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to