Ned Batchelder wrote: > On Saturday, May 23, 2015 at 9:01:29 AM UTC-4, Steven D'Aprano wrote: >> On Sat, 23 May 2015 10:33 pm, Thomas 'PointedEars' Lahn wrote: >> > If only characters were represented as sequences UTF-16 code units in >> > ECMAScript implementations like JavaScript, there would not be a >> > problem beyond the BMP; >> >> Are you being sarcastic? > > IIUC, Thomas' point is that *characters* should be sequences of > codepoints, not that *strings* should be.
No, my point is that one character should be a sequence of code _units_ (for a code point value). But in ECMAScript implementations (so far), a *code point value* equals a character, and that is a problem in ECMAScript because there the value range is limited to what can be encoded in 16 bit. The problem starts beyond the BMP where 16 bit are no longer sufficient for a code sequence and code point value, and code sequence and code point value are no longer equal. -- PointedEars Twitter: @PointedEars2 Please do not cc me. / Bitte keine Kopien per E-Mail. -- https://mail.python.org/mailman/listinfo/python-list