In article <ofc0fea11b.05dda05c-onc125826a.0038eb98-c125826a.0038f...@notes.na.collabserv.com> you write: >-=-=-=-=-=- >-=-=-=-=-=- >-=-=-=-=-=- > >Hello folks > >I've been tasked with finding out what the general consensus is on the >support in email headers for International characters such as UTF-8 >Charcacters and including things like accented characters like � and � and >can also include Asian and Cyrillic characters. > >I know there's an RFC from 2012, but my Product Dev people are interested >in knowing how wide-spread the actual adoption is.
Funny you should ask. I'm doing some work for the UASG group to document how internationalized email (known as EAI) works. UTF-8 in everything except the actual addresses can be in MIME body parts and encoded-words in mail headers. Those have been around for at least a decade and should work everywhere. RFCs 6530-6533 defined an SMTP extension called SMTPUTF8 which, to oversimplify a little, allows UTF-8 anywhere you can have ASCII, including in both the local part and the domain part of the addresses. This modifies both the messages themselves and the address in the SMTP dialog MAIL FROM and RCPT TO. Uptake has been slow, but Gmail quietly added support last year, and Hotmail/Outlook/Live added support about a month ago. Some of the large Chinese services like Coremail support it as do some Indian services like Xgenplus. Yahoo/AOL/Oath have as far as I can tell no plans to support it. The Gmail and Hotmail support handles other people's UTF-8 addresses in mail but they still don't provide UTF-8 addresses on their own systems. It is my impression that the main interest is currently in India since some bits of the government are planning to hand out e-mail addresses to go with the biometric IDs, and a lot of Indians are literate in their own languages, which are written in their own scripts, but not English. Having recently written EAI support into my own qmail system I can say that the basic address handling was a lot easier than I expected, since most mail code these days is already 8-bit clean sort of by accident. The hardest part, which I haven't done yet, is generalizing the address mapping that MTAs do on incoming mail. Converting between upper and lower case is remarkably language-specific, even in languages written in Latin characters. Add things like all the ways Unicode can represent accented characters, the meaning of o-with-umlaut which is short for "oe" in German but not in Scandinavia, and Chinese traditional and simplified characters and it's a challenge to make addresses work in ways that seem natural in whatever language the address is written in. R's, John PS: Youtube blurb about EAI we recorded in San Juan a few weeks ago here: https://youtu.be/REDeEhvHwsU The Microsoft guy announces support in Hotmail/Outlook.
_______________________________________________ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop