Chr. von Stuckrad wrote:
On Fri, Jul 25, 2003 at 04:41:54PM -0400, Daniel Carrera wrote:

body      MY_CONSONANT_4  /[^aeiou]{4}/
describe  MY_CONSONANT_4  Body contains 4 consecutive consonants.
score     MY_CONSONANT_4  0.15


The pattern might be dangerous for french, chinese,
or polish mails :-)  Because chinese utf8 or koi8 code has
'only consonants' of the above definition.
And e.g. frnch has lots of accented characters.
Also polish has words with SO many consonants
in a row, that even we germans have problems with
those words :-)

For french, there aren't many words which would be catched by the MY_CONSONANT_4 (and maybe not any : as a native french speaker I couldnt find any), and not any one by MY_CONSONANT_5... Same for german (but I'm not a native speaker).
In fact the risk of FP is more important with quotted Message-ID (or other unique ID, a tracking ID for example) than with German & French words...


--
Maxime Ritter



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to