On 5/26/22, Christopher Barker <[email protected]> wrote:
> IIRC, there were two builds- 16 and 32 bit Unicode. But it wasn’t UTF16, it
> was UCS-2.
In the old implementation prior to 3.3, narrow and wide builds were
supported regardless of the size of wchar_t. For a narrow build, if
wchar_t was 32-bit, then PyUnicode_FromWideChar() would encode non-BMP
ordinals as UTF-16 surrogate pairs, and PyUnicode_AsWideChar()
implemented the reverse, from UTF-16 back to UTF-32. There were
several similar cases, such as PyUnicode_FromOrdinal().
The header called this "limited" UTF-16 support, primarily I suppose
because the length of strings and indexing failed to account for
surrogate pairs. For example:
>>> s = '\U00010000'
>>> len(s)
2
>>> s[0]
'\ud800'
>>> s[1]
'\udc00'
Here's a link to the old implementation:
https://github.com/python/cpython/blob/v3.2.6/Objects/unicodeobject.c
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/ATPNS7CEQUONIWDXFCQEEUUGJBOJV72L/
Code of Conduct: http://python.org/psf/codeofconduct/