On Tue, Nov 26, 2013 at 10:35 AM, Ben Finney <ben+pyt...@benfinney.id.au> wrote: > Chris Angelico <ros...@gmail.com> writes: > >> (Fifteen years. It's seventeen years since Unicode 2.0, when 16-bit >> characters were outmoded. It's about time _every_ modern language >> followed Python's and Pike's lead and got its Unicode support right.) > > Most languages that already have some support for Unicode have a > significant amount of legacy code to continue supporting, though. Python > has the same problem: there're still heaps of Python 2 deployments out > there, and more being installed every day, none of which do Unicode > right. > > To fix Unicode support in Python, the developers and community had to > initiate – and is still working through – a long, high-effort transition > across a backward-incompatible change in order to get the community to > Python 3, which finally does Unicode right.
Yes, but Python can start that process by creating Python 3; other languages ought to be able to do something similar. Get the process started. It's not going to get any easier by waiting. And, more importantly: New languages are being developed. If their designers look at Java, they'll see "UTF-16 is fine for them, so it'll be fine for us", but if they look at Python, they'll see "The current version of Python does it this way, everything else is just maintenance mode, so this is obviously the way the Python designers feel is right". Even if 99% of running Python code is Py2, that message is still being sent, because Python 2.8 will never exist. > Other language communities will likely have to do a similar huge effort, > or forever live with nearly-right-but-fundamentally-broken Unicode > support. > > See, for example, the enormous number of ECMAScript deployments in every > user-facing browser, all with the false assumption (§2 of ECMA-262 > <URL:http://www.ecma-international.org/publications/standards/Ecma-262.htm>) > that UTF-16 and Unicode are the same thing and nothing outside the BMP > exists. > > And ECMAScript is near the front of the programming language pack in > terms of Unicode support — most others have far more heinous flaws that > need to be fixed by breaking backward compatibility. I wish their > communities luck. Yeah. I'm now of the opinion that JavaScript and ECMAScript can't be fixed ("use strict" is entirely backward compatible, but changing string handling wouldn't be), so it's time we had a new web browser scripting language. Really, 1996 was long enough ago that using 16-bit characters should be considered no less wrong than 8-bit characters. If it weren't that we don't actually need the space any time soon, I would consider the current limit of 11141112 characters to be a problem too; there's really no reason to restrict ourselves based on what UTF-16 is capable of encoding any more than we should define Unicode based on what Code Page 437 can handle. > \ “Nature hath given men one tongue but two ears, that we may | > `\ hear from others twice as much as we speak.” —Epictetus, | > _o__) _Fragments_ | One of my brothers just got married, and someone who's friended him on Facebook was unaware of the invitations despite being a prolific poster. I cited the modern equivalent of the above, namely that we have ten fingers but only two eyes, so it's acceptable to write five times as much as we actually bother to read... ChrisA -- https://mail.python.org/mailman/listinfo/python-list