On Wed, 2011-12-14 at 19:38 -0500, dar...@chaosreigns.com wrote:
> On 12/15, Martin Gregorie wrote:
> > I'm getting spam with the Subject, Sender personal name and body all
> > written in Cyrillic, but, despite having "ok_locales en fr de" defined
> > in local.cf, no rules are fired to mark the message as being in an
> > unwanted language. 
> 
> Probably related to this:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078
> 
> There's also TextCat, which is also broken:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6364
> 
> Basically, spamassassin's detection of languages is broken.
> 
I agree that it seems to be broken by UTF-8 in the way that bug 4078
describes for Windows codepages.

Could somebody with access to the SA Bugzilla kindly add a comment to
bug 4078 saying that this is also an issue with Cyrillic encoded in
UTF-8? I'm asking because at present #4078 only mentions Windows code
pages and koi8. There is nothing to indicate that this is also a problem
with UTF-8.


Martin


Reply via email to