[Python-ideas] Re: TextIOBase: Make tell() and seek() pythonic

Eryk Sun Thu, 26 May 2022 01:05:43 -0700

On 5/26/22, Christopher Barker <[email protected]> wrote:
> IIRC, there were two builds- 16 and 32 bit Unicode. But it wasn’t UTF16, it
> was UCS-2.


In the old implementation prior to 3.3, narrow and wide builds were
supported regardless of the size of wchar_t. For a narrow build, if
wchar_t was 32-bit, then PyUnicode_FromWideChar() would encode non-BMP
ordinals as UTF-16 surrogate pairs, and PyUnicode_AsWideChar()
implemented the reverse, from UTF-16 back to UTF-32. There were
several similar cases, such as PyUnicode_FromOrdinal().

The header called this "limited" UTF-16 support, primarily I suppose
because the length of strings and indexing failed to account for
surrogate pairs. For example:

    >>> s = '\U00010000'
    >>> len(s)
    2
    >>> s[0]
    '\ud800'
    >>> s[1]
    '\udc00'

Here's a link to the old implementation:

https://github.com/python/cpython/blob/v3.2.6/Objects/unicodeobject.c
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/ATPNS7CEQUONIWDXFCQEEUUGJBOJV72L/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: TextIOBase: Make tell() and seek() pythonic

Reply via email to