On Fri, Mar 18, 2016 at 10:44 AM, Steven D'Aprano <st...@pearwood.info> wrote: > On Sat, 19 Mar 2016 02:31 am, Random832 wrote: > >> On Fri, Mar 18, 2016, at 11:17, Ian Kelly wrote: >>> If the string is simple UCS-2, that's easy. > > Hmmm, well, nobody uses UCS-2 any more, since that only covers the first > 65536 code points. Rather, languages like Javascript and Java, and the > Windows OS, use UTF-16, which is a *variable width* extension to UCS-2. I > don't know about Windows, but Javascript implements this badly, so that > 4-byte UTF-16 code points are treated as *two* surrogate code points > instead of the single code point they are meant to be.
The reason I specifically brought up UCS-2 is because *Python* uses UCS-2 (or Latin-1, or UCS-4 depending on the nature of the string). -- https://mail.python.org/mailman/listinfo/python-list