On Wed, 2011-12-14 at 23:36 -0500, dar...@chaosreigns.com wrote: > On 12/15, Martin Gregorie wrote: > > Could somebody with access to the SA Bugzilla kindly add a comment to > > bug 4078 saying that this is also an issue with Cyrillic encoded in > > UTF-8? I'm asking because at present #4078 only mentions Windows code > > pages and koi8. There is nothing to indicate that this is also a problem > > with UTF-8. > > Although as Karsten pointed out, bug 4078 isn't actually > related, since that bug is actually related to character sets primarily in > another language. Which UTF8 is not. Bug 6364 is probably exactly the > same as your issue, just in a different language - needing TextCat fixed / > rewritten. > The actual problem is that bug 4078 is over-restrictive in its applicability: it merely says that CHARSET_FARAWAY_HEADER isn't returned if a message body is in Hebrew.
The problem that needs addressing is that the ok_locales configuration parameter doesn't work. This appears to be because it thinks the sender's choice of (in Windows terms) the character translation code page is a reliable indication of the sender's locale. I accept that this used to work, but since the widespread introduction of UTF-8 and other Unicode encodings, any such assumption is deeply flawed. The same comments are also applicable to textcat (bug 6364) There are really only two possibilities for resolving these bugs: 1) Fix bug 6364 by rewriting the code textcat uses to recognise the predominant language used in body text. Fix bug 4078 by rationalising ok_locales to use the revised textcat code to determine the locale used by the sender before comparing this with the list of acceptable locales. 2) Declare textcat and ok_locales to be irretrievably broken and remove them from future versions of SA. That said, I'm happy to become a bugzilla user, but before I add anything to it, I'd like to know if you'd prefer me to add comments to 4078 and/or 6364 or if it would be best raise a new bug containing my suggestion #1. I've kept an example message that I can provide as evidence. Martin