On Fri, Jul 25, 2003 at 04:41:54PM -0400, Daniel Carrera wrote:
body MY_CONSONANT_4 /[^aeiou]{4}/ describe MY_CONSONANT_4 Body contains 4 consecutive consonants. score MY_CONSONANT_4 0.15
The pattern might be dangerous for french, chinese, or polish mails :-) Because chinese utf8 or koi8 code has 'only consonants' of the above definition. And e.g. frnch has lots of accented characters. Also polish has words with SO many consonants in a row, that even we germans have problems with those words :-)
For french, there aren't many words which would be catched by the MY_CONSONANT_4 (and maybe not any : as a native french speaker I couldnt find any), and not any one by MY_CONSONANT_5... Same for german (but I'm not a native speaker).
In fact the risk of FP is more important with quotted Message-ID (or other unique ID, a tracking ID for example) than with German & French words...
-- Maxime Ritter
------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk