On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: > If nobody had ever thought of doing a multi-format string > representation, I could well imagine the Python core devs debating > whether the cost of UTF-32 strings is worth the correctness and > consistency improvements... and most likely concluding that narrow > builds get abolished. And if any other language (eg ECMAScript) decides > to move from UTF-16 to UTF-32, I would wholeheartedly support the move, > even if it broke code to do so.
Unfortunately, so long as most language designers are European-centric, there is going to be a lot of push-back against any attempt to fix (say) Javascript, or Java just for the sake of "a bunch of dead languages" in the SMPs. Thank goodness for emoji. Wait til the young kids start complaining that their emoticons and emoji are broken in Javascript, and eventually it will get fixed. It may take a decade, for the young kids to grow up and take over Javascript from the old-codgers, but it will happen. > To my mind, exposing UTF-16 surrogates > to the application is a bug to be fixed, not a feature to be maintained. This, times a thousand. It is *possible* to have non-buggy string routines using UTF-16, but the implementation is a lot more complex than most language developers can be bothered with. I'm not aware of any language that uses UTF-16 internally that doesn't give wrong results for surrogate pairs. -- Steven -- http://mail.python.org/mailman/listinfo/python-list