On Thu, 28 Feb 2002, Michael Shields wrote: > Craig R Hughes <[EMAIL PROTECTED]> wrote: > > this is that rules which are really non-discriminators end up sometimes getting > > odd-looking scores. For example, CYBER_FIRE_POWER is just not likely to really > > be worth -4.020 if looked at in isolation, but it turns out that the 10 messages > > in the corpus which trigger that rule also trigger about a billion other ones. > > So, are you saying that rules that are matched only be egregious spam > that's already caught get essentially random scores? Is there a way > we can use this to catch nonuseful rules and disable them for speed?
I've been following this, and to be quite honest I don't know anything about GA algorithms, but I had a thought anyway... Would it make sense to take a look at the ratio of spam to non-spam for each given rule, and to constrain the score to either -ve or +ve depending on which way the ratio leaned? This way, "monsterhut" may wander randomly, but it will only wander randomly in the +ve direction, or peg itself at zero. I can't imagine any situation where there were more spams than non-spams that trigger a rule, yet you want a -ve score. (and vice-versa) Also, would it be useful to remove or seriously constrain rules which had nearly equal hits in spam vs non-spam? ttyl srw -- Walde Technology Networks, Internet, Intranets Saskatoon, SK CANADA Linux Support, Web Programming 306-221-7393 Network Security, Firewalls _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk