On 2021-05-25, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 2021-05-25 16:41, Dennis Lee Bieber wrote:
>> In Python 3, strings are UNICODE, using 1, 2, or 4 bytes PER >> CHARACTER (I don't recall if there is a 3-byte version). If your >> input bytes are all 7-bit ASCII, then they map directly to a 1-byte >> per character string. If they contain any 8-bit upper half >> character they may map into a 2-byte per character string. >> > In CPython 3.3+: > > U+0000..U+00FF are stored in 1 byte. > U+0100..U+FFFF are stored in 2 bytes. > U+010000..U+10FFFF are stored in 4 bytes. Are all characters in a string stored with the same "width"? IOW, does the presense of one Unicode character in the range U+010000..U+10FFFF in a string that is otherwise all 7-bit ASCII values result in the entire string being stored 4-bytes per character? Or is the storage width variable within a single string? -- Grant -- https://mail.python.org/mailman/listinfo/python-list