On Sat, 23 May 2015 10:33 pm, Thomas 'PointedEars' Lahn wrote: > If only characters were represented as sequences UTF-16 code units in > ECMAScript implementations like JavaScript, there would not be a problem > beyond the BMP;
Are you being sarcastic? This is Rhino: js> var c = String.fromCharCode(65535); // in the BMP js> print(c.charCodeAt(0)); 65535 So far so good. js> var c = String.fromCharCode(65536); // astral character js> print(c.charCodeAt(0)); 0 Can you name any ECMAScript implementation which correctly handles code points in the supplementary multilingual planes? By the way, for many years Python implemented Unicode as UTF-16 code units, the so-called "narrow build": py> c = unichr(65536) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: unichr() arg not in range(0x10000) (narrow Python build) Let's try again: py> c = u'\U00010000' # a single code point py> len(c) 2 I'm not saying that it is impossible to have a correct Unicode implemention using UTF-16, but I've never seen one. -- Steven -- https://mail.python.org/mailman/listinfo/python-list