On Sat, Oct 26, 2019 at 07:38:19PM -0400, David Mertz wrote: > On Sat, Oct 26, 2019, 7:29 PM Steven D'Aprano > > > > (At worst, a code-point in UTF-8 takes three bytes, compared to four in > > UTF-16 or UTF-32.) > > > > http://www.fileformat.info/info/unicode/char/10000/index.htm
Oops, you're right, UTF-8 can use four code units (four bytes) too, I forgot about that. Thanks for the correction. So in the worst case, if your string consists of all (let's say) Linear-B syllables, UTF-8 will use four bytes per character, the same as UTF-32. But for strings consisting of a mix of (say) ASCII, Latin-1, etc with only a few Linear-B syllables, UTF-8 will use a lot less memory. -- Steven _______________________________________________ Python-ideas mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/DNFYA7Z3IGDWYLNMKL7ITZ3AON6JJVKO/ Code of Conduct: http://python.org/psf/codeofconduct/
