On 2/28/02 7:27 AM, "Michael Shields" <[EMAIL PROTECTED]> wrote:
> In article > <[EMAIL PROTECTED]>, > Craig R Hughes <[EMAIL PROTECTED]> wrote: >> this is that rules which are really non-discriminators end up sometimes >> getting >> odd-looking scores. For example, CYBER_FIRE_POWER is just not likely to >> really >> be worth -4.020 if looked at in isolation, but it turns out that the 10 >> messages >> in the corpus which trigger that rule also trigger about a billion other >> ones. > > So, are you saying that rules that are matched only be egregious spam > that's already caught get essentially random scores? Is there a way > we can use this to catch nonuseful rules and disable them for speed? It's definitely a way of catching "useless" rules. I'm also working on modifying the algorithm so it's smarter about assigning "random" scores, making it tend to increase the scores of rules which occur more frequently in spam, and tend to decrease the scores of rules which occur in nonspam more. I'm also tightening the cap on rule scores (turns out my code was already capping scores, not allowing them to go outside (-15..15)+(gaussian noise of mean 3.0). I'm now bringing that down to (-4..4)+(gaussian noise of mean 1.0). I've seeded the scores with the 50_rules.cf which was submitted (with scores manually tweaked from distro scores), and I'm re-runnign the GA now. I'm shooting for a 2.11 release sometime today which fixes the score file and the spamproxyd using MyMailAudit bug. C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk