On 3/11/02 4:53 AM, "Michael Moncur" <[EMAIL PROTECTED]> wrote:
> Matt Sergeant wrote: >> I would suggest that we be extremely careful about checks that get given >> a score over 5. Part of the beauty of SpamAssassin (and heuristics in >> general) is that usually a hit just contributes to the overall score, but >> doesn't necessarily tip things over. Having said that, some things almost >> inevitably should be over 5, like ratware. > > I agree - with too many scores over 5 SA doesn't have much advantage over a > simple set of procmail filters. I think the GA scores should be reduced (or > limited?) next time - I'm thinking of limiting mine so that nothing scores > over > 3.0. That way a message would need to meet, at minimum, two strong spam > criteria in order to be flagged as spam. The new scores are in fact limited during evolution to -4..4 +/- (gaussian noise of mean 1) In the next iteration I might reduce this to -3..3 +/- noise The fact that the mortgage rate rule scored so high is partly random chance (it got a tail on the gaussian distribution), and partly that for the corpus it is an excellent indicator. I've been doing some reading since 2.11 went out on Stein's Paradox, so I most likely will actually make some other changes to the way the scores are calculated. What I really need first though is to read a discussion of how Stein's Paradox applies to GA evolution -- not sure if the GA will "automatically" take the multiple-estimate stuff into account. I imagine it probably will, but if anyone has a reference for me, I'd appreciate it. > After the GA runs someone could override scores that are absolute spam > indicators - i.e. ratware - with a higher score. I did tweak a couple of the scores after the GA ran -- didn't tweak the mortgage rate one, mostly because it was still close to 5, but also because I wasn't thinking about home buyers, realtors, etc. at the time. How ever, I'm not sure that the right thing to do isn't leave in a fairly high-scoring rule score there, and if you're sysadmin for a realtor, you should disable that rule... > For now I've just set my required_hits to 7.0, which seems to work pretty well > with the current scores. _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk