On 18/05/12 07:54, dar...@chaosreigns.com wrote: > Locale handling is a known problem is SA: > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
bug opened in 2004 :-( I'm no linguist but this is probably an extremely hard problem to solve. An email can have mixtures of languages, so in a perfect world we should be able to change locale per word (or per char? - eeek!). This also bleeds into the issues surrounding how "ok_locales" doesn't work (as desired) in the modern UTF world too. ie SA would need to "know" what locales an email contains (which helps ok_locales) so that it can then dynamic change word boundary definitions/etc for rules. Yuck Perhaps this should be just classified as a bug in perl and forgotten about ;-) [does python,etc handle this any better?] -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +1 408 481 8171 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1