On Wed, 2011-12-14 at 19:38 -0500, dar...@chaosreigns.com wrote: > On 12/15, Martin Gregorie wrote: > > I'm getting spam with the Subject, Sender personal name and body all > > written in Cyrillic, but, despite having "ok_locales en fr de" defined > > in local.cf, no rules are fired to mark the message as being in an > > unwanted language. > > Probably related to this: > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078 > > There's also TextCat, which is also broken: > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6364 > > Basically, spamassassin's detection of languages is broken. > I agree that it seems to be broken by UTF-8 in the way that bug 4078 describes for Windows codepages.
Could somebody with access to the SA Bugzilla kindly add a comment to bug 4078 saying that this is also an issue with Cyrillic encoded in UTF-8? I'm asking because at present #4078 only mentions Windows code pages and koi8. There is nothing to indicate that this is also a problem with UTF-8. Martin