RW> How about a "bonus" for cumulative effect? Why not do a second-level
    RW> analysis after scoring; something like:
    RW> 
    RW> 3 positive score matches - add 1.0
    RW> 4 positive score matches - add 2.0
    RW> 5 positive score matches - add 4.0
    RW> 6 positive score matches - add 8.0

    Craig> This is I think a little too simplistic -- and if it were correct
    Craig> to do this, it'd be reflected to some extent in the current
    Craig> scores.

Perhaps, but couldn't having THREE_MATCH, FOUR_MATCH, FIVE_MATCH, SIX_MATCH
rules each with increasing weights help move the GA along a little faster?
You're giving it a piece of information it didn't have before: the more
tests that score positive the more likely the message is to be spam
irregardless of the actual score.  It's really no different than adding
another header or body matching test like the same upper case word occurring
twice in the header.  Before you added the test the GA couldn't take it into
account.  Now that you have, it can.  

Once you have the tests in there it can weight them.  If they always sit at
or very near zero they obviously aren't contributing anything and should
probably be removed, but if the GA winds up scoring them in an ascending
fashion (even if SIX_MATCH only gets a small positive score), you've
probably identified another useful spammish indicator.

-- 
Skip Montanaro ([EMAIL PROTECTED] - http://www.mojam.com/)
Boycott Netflix - they spam - http://www.musi-cal.com/~skip/netflix.html


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to