Accidentally sent this just to Florian. Fixing that. On Wed, Apr 12, 2017 at 01:36:49PM +0200, Florian Weimer wrote: > What's the current standardization status of IDNA?
It's complicated. > As far as I can tell, a lot of vendors are still stuck with the original > IDNA standard (IDNA2003). Well, not exactly. IDNA 2003 is defined only for Unicode 3.2, which nobody has used for a long time. So strictly speaking, what most of those vendors are doing is undefined, since there's no way to know what version of Unicode you have installed. Bur practically, of course, this is what's happening. > There are three or more competing successors, > IDNA2008 as standardized by the IETF (without any tweaks), the Unicode IDNA > standard TS46 (<http://www.unicode.org/reports/tr46/>, which is configurable > and allegedly compatible with IETF IDNA2008, but is not because it yields > different results than IETF IDNA2008), and the Mozilla/DENIC IDNA > implementation in Firefox/Thunderbird > (<https://bugzilla.mozilla.org/show_bug.cgi?id=479520> and other sources). UTS#46 is allegedly a transitional technology that is supposed to do mapping so that things that did work in IDNA2003 but that won't under IDNA2008 will continue to work. As a practical matter, it does not appear to have a mechanism by which the transition can be brought to an end, so it's hard to see it as an actual transition mechanism. It's also hard to understand why it has so many characters marked as "valid" that are not valid under any version of IDNA, including many emojis. > These aren't compatible. You can see this by visiting > <https://www.buße.de/> in different browsers. Yes. > Is there an ongoing effort to reconcile application behavior? Different > TLDs appear to expect different IDNA implementations. ICANN's rules require IDNA2008. There is a new version of the guidelines out -- I can't recall whether the public comment is closed. Many ccTLDs voluntarily conform to the guidelines also, but ICANN can't compel it. And of course ICANN has no control of other things down the tree. Part of the trouble comes because the WHATWG is apparently opposed to IDNA2008, partly on the grounds that it makes some domain names that were valid under 2003 invalid. > One practical problem with IDNA2003 is that it prevents having Hebrew domain > names containing ASCII digits. Any of the IDNA2008 variants mentioned above > will fix that, I think, but it's difficult to pick a variant to implement. > And I certainly don't want to implement per-TLD policies. Another practical problem with IDNA2003 is that it loses information round trip: Unicode can be sent into the algorithm, turned into Punycode, and when it is turned back into Unicode the result is not the same. ß is mapped, for instance, so you get back "ss" whatever you put in. This issue is one of the main things that IDNA2008 fixes, in my opinion: every valid A-label matches exactly one valid U-label, and anything that does not have this property is not a valid A- or U-label. There are problems with IDNA that have prevented it being updated for recent versions of Unicode. Asmus Freytag, John Klensin, and I are working on a draft to try to fix that. Best regards, A -- Andrew Sullivan a...@anvilwalrusden.com _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop