On 2/28/02 7:27 AM, "Michael Shields" <[EMAIL PROTECTED]> wrote:

> In article 
> <[EMAIL PROTECTED]>,
> Craig R Hughes <[EMAIL PROTECTED]> wrote:
>> this is that rules which are really non-discriminators end up sometimes
>> getting 
>> odd-looking scores.  For example, CYBER_FIRE_POWER is just not likely to
>> really 
>> be worth -4.020 if looked at in isolation, but it turns out that the 10
>> messages 
>> in the corpus which trigger that rule also trigger about a billion other
>> ones.
> 
> So, are you saying that rules that are matched only be egregious spam
> that's already caught get essentially random scores?  Is there a way
> we can use this to catch nonuseful rules and disable them for speed?

It's definitely a way of catching "useless" rules.  I'm also working on
modifying the algorithm so it's smarter about assigning "random" scores,
making it tend to increase the scores of rules which occur more frequently
in spam, and tend to decrease the scores of rules which occur in nonspam
more.  I'm also tightening the cap on rule scores (turns out my code was
already capping scores, not allowing them to go outside (-15..15)+(gaussian
noise of mean 3.0).  I'm now bringing that down to (-4..4)+(gaussian noise
of mean 1.0).  I've seeded the scores with the 50_rules.cf which was
submitted (with scores manually tweaked from distro scores), and I'm
re-runnign the GA now.  I'm shooting for a 2.11 release sometime today which
fixes the score file and the spamproxyd using MyMailAudit bug.

C


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to