At Thu Aug 7 21:46:40 2003, "Mike Kuentz (2)" wrote: > > I'd think some other combos that do appear in English language, but not > frequently, would be good to add also. Like zk. The only match I can come > up with is blitzkrieg. I believe it's originally a German word but has made > it's way into the English language. Combos like that could be used as a > lower score, or in conjunction with other funky letter combos.
One solution would be to count the number of distinct odd combinations found, then score depending on that. As an example, look at how Nigerian scam mails are identified - a whole series of __NIGERIAN_BODY_nn rules, then a check to see if more than one of those was hit. In this case, you might want a different score depending on how many different 'unusual' combinations were hit - if you start at (say) 3, then you minimise the chances of FPs from rare words like blitzkrieg, etc. Martin -- Martin Radford | "Only wimps use tape backup: _real_ [EMAIL PROTECTED] | men just upload their important stuff -o) Registered Linux user #9257 | on ftp and let the rest of the world /\\ - see http://counter.li.org | mirror it ;)" - Linus Torvalds _\_V ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk