-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Daniel,
Friday, July 25, 2003, 1:41:54 PM, you wrote: DC> Here is a thought: We could test for n consecutive consonants. The DC> more consecutive consotants, the more likely it is to be spam. Sounds like a good idea. DC> body MY_CONSONANT_4 /[^aeiou]{4}/ DC> describe MY_CONSONANT_4 Body contains 4 consecutive consonants. DC> score MY_CONSONANT_4 0.15 DC> Like wise we can go on with more consonants: DC> score MY_CONSONANT_4 0.15 DC> score MY_CONSONANT_5 0.30 DC> score MY_CONSONANT_6 0.60 DC> score MY_CONSONANT_7 1.20 DC> score MY_CONSONANT_8 2.40 I'm more conservative than you, so I've lowered the scores somewhat, but otherwise I like the idea. I've also added vowels-5 and up. I know I'll get some matches in ham, siiiiiiigh, but that ham will usually score well under half my required hits threshold, so I'm not worried. One thing I've been meaning to mention concerning these -- while I usually do manually feed confirmed spam under or just barely over my required hits threshold into Bayes, I do NOT do so with these spams -- I don't want to garbage up my tokens database with unique never-to-be-seen-again tokens. (On rare occasion I have actually manually edited the mailbox file and removed this garbage from the spam, then called upon sa-learn, but usually I haven't had to bother with this extreme action.) Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBPyHvP5ebK8E4qh1HEQIWqACfYdX8ARQQ60R67Hfei5nZIglJrosAoPAS Am5r6+ezP3korqXv3/wbLYnD =Mhnb -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk