On Thu, 28 Feb 2002, Michael Shields wrote:

> Craig R Hughes <[EMAIL PROTECTED]> wrote:
> > this is that rules which are really non-discriminators end up sometimes getting
> > odd-looking scores.  For example, CYBER_FIRE_POWER is just not likely to really
> > be worth -4.020 if looked at in isolation, but it turns out that the 10 messages
> > in the corpus which trigger that rule also trigger about a billion other ones.
>
> So, are you saying that rules that are matched only be egregious spam
> that's already caught get essentially random scores?  Is there a way
> we can use this to catch nonuseful rules and disable them for speed?

I've been following this, and to be quite honest I don't know anything
about GA algorithms, but I had a thought anyway...

Would it make sense to take a look at the ratio of spam to non-spam for
each given rule, and to constrain the score to either -ve or +ve depending
on which way the ratio leaned?  This way, "monsterhut" may wander
randomly, but it will only wander randomly in the +ve direction, or peg
itself at zero.  I can't imagine any situation where there were more spams
than non-spams that trigger a rule, yet you want a -ve score. (and
vice-versa)

Also, would it be useful to remove or seriously constrain rules which had
nearly equal hits in spam vs non-spam?

ttyl
srw

-- 
Walde Technology                Networks, Internet, Intranets
Saskatoon, SK  CANADA           Linux Support, Web Programming
306-221-7393                    Network Security, Firewalls





_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to