On Fri, Sep 6, 2013, at 11:46, Piet van Oostrum wrote: > The FSR does not split unicode in chuncks. It does not create problems > and therefore it doesn't have to solve this. > > The FSR simply stores a Unicode string as an array[*] of ints (the > Unicode code points of the characters of the string. That's it. Then it > uses a memory-efficient way to store this array of ints. But that has > nothing to do with character sets. The same principle could be used for > any array of ints.
I think the source of the confusion is that it is described in terms of UCS-2 and Latin-1, which people often think of (especially latin-1) as different encodings rather than merely storing code points in a narrower type. ---- Incidentally, how does all this interact with ctypes unicode_buffers, which slice as strings and must be UTF-16 on windows? This was fine pre-FSR when unicode objects were UTF-16, but I'm not sure how it would work now. -- https://mail.python.org/mailman/listinfo/python-list