On Wednesday, May 29, 2002, at 10:51  PM, Michael Moncur wrote:
>> I came up with the name "Five-Card Charlie", which is a reference to 
>> the
>> game of Blackjack, where under some rules the player wins if he has any
>> hand of five cards and does not bust (exceed 21).   I figured if any
>> message tripped 5 positive tests, the chances of it being non-spam were
>> very small, so that would tip it over into the SPAM=yes category
>
> Actually I'd say if a message tripped 5 really low-scoring tests, it 
> isn't
> necessarily spam. If it trips some higher-scoring tests and adds up to 
> 5.0,
> it is spam. Isn't this what we have already?

No, because the GA (if I understand how it is used correctly) only 
considers rules individually, and not in combination (by number or 
specifically).  What I and some others have argued is that in many cases 
tripping 5 low-scoring rules may be a better indicator of spam that the 
single, additive numerical score would show.   It is possible for the GA 
to derive this as well, but the magnitude of the computation involved 
(if you start using combinations, not just a number of hits) starts 
getting horrendous very quickly.   This is probably a case where a 
human-optimized score is more practical.

Five-card Charlie is probably too simplistic to be the permanent 
solution to this, but it might help in the short term.

--
Michael C. Berch
[EMAIL PROTECTED]


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to