The content is transfered as UTF-8 at the MIME level for both the plain-text and HTML parts attached:
--_000_76A14357762B4A06BD1E68CB3C8C452Dgluesoftcojp_ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 ... --_000_76A14357762B4A06BD1E68CB3C8C452Dgluesoftcojp_ Content-Type: text/html; charset="utf-8"Content-Transfer-Encoding: base64 ... Normally the xml declaration or meta tag in the (X)HTML headers should be ignored, mail agents will not transform the attachments except possibly changing the content-transfer-encoding (here base64 in both parts). If you mail agents does not process the HTML part, it will render the plain-text verson which has no declaration at all, the MIME content-type will then be the only indication. I don't know how the sending email agent could generate "us-ascii" in XHTML headers, but in fact it should simply be discarded in all cases (in HTML5, us-ascii or iso-8859-1 and their aliases are normally all treated like windows-1252, and "us-ascii" is simply ignored, it is bugged by itself in almost all cases). But here we are not in an HTML5 case; so once the HTML headers are discarded, the next candidate is the MIME part declaration (the transporting layer). As it specifies UTF-8, this should work without forcing the reading email agent to start using its "encoding guessing" magic. But even in this case, UTF-8 is certainly a better guess than Iegacy japanese charsets (using various settings in the reding mail agent such as specifying a prefered default encoding to use one of these legacy charsets has no effect UTF-8 is always used to process the message. So those that see bugs are affected if: - a user sending is message with an outdated and bugged email agent for composing and sending the mail (which inserts compeltely incorrect XHTML headers) - recipients use themselves an outdated and bugged email agent, not performing the most reasonable processing and guessing steps (or this behavior can only be reproduced by those using a email agent whose user localisation is Japanese). The encoding guesser here is most probably bugged but affected by the fact that there are not enough contextual content to guess it with good confidence (only a few isolated characters whose use here was discretionary and extremely rare in an English text conten, those few characters have near-zero confidence value in English as long as there's no other East Asian language used). It looks like the reading email agent does not reach a minimum threshold level of confidence for the guessed encoding; so it eems that the result of the guesser is simply discarded, and then the reading email agent only uses the default user setting of the encoding to use to process messages with unknown/unspecified encodings. I'm not sure this is valid to discard the UTF-8 explicit MIME declaration which does not come from the encoding guesser, as UTF-8 is now a solid default to use (a default now for almost all new IETF standards since long now, with now a wide majority of software installations using it effectively as ther default), notably when it is specified as here). We know that UTF-8 is now the best guess for content at the *worldwide* level. But is UTF-8 still a minority encoding for contents exchanged in Japan ? The ISO2022-JP seems very unlikely to be used instead of UTF-8, and I would have possibly expected a shift-JIS variant instead, if Unicode is still not the best choice for Japan. But if the email agent is on a now antique OS (Windows XP or 2000 ? themselves installed with in their Japanese localisation) may be that user never updated its agent for that old OS (and it is quite surprising for Japan that like to use the newest technology products, except if the reading user is using a tricky installation with lots of personal system settings for their "geek" tools that have never been ported to newer OSes). In my opinion we are in an extremely user-specific situation. But I do not see where the mailing list was acting incorrectly (it won't change its settings only for a few "geeks" with tricky installations and using antique softwares). 2014-04-04 6:48 GMT+02:00 Koji Ishii <[email protected]>: > Go to Encoding menu and choose UTF-8 to fix the garbled characters. > > It looks like the page is served in UTF-8, but it declares itself as > us-ascii: > <?xml version="1.0" encoding="us-ascii”?> > and > <meta http-equiv="Content-Type" content="text/html; charset=us-ascii" /> > > /koji > > On Apr 4, 2014, at 8:16 AM, Buck Golemon <[email protected]> wrote: > > I too received the intended emoji via direct email but I see the garbled > characters in the web interface: > > ヽ( ̄д ̄;)ノ - worried > > > > ヾ(@゜▽゜@)ノ - happy > > ヽ(#`Д´)ノ - angry > > 【・_・?】- confused > > > I believe there is an encoding issue somewhere in the > unicode.org/mail-arch toolchain. > > >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

