On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list <python-list@python.org> wrote: > > On 2021-05-26, Alan Gauld <alan.ga...@yahoo.co.uk> wrote: > > On 25/05/2021 23:23, Terry Reedy wrote: > >> In CPython's Flexible String Representation all characters in a string > >> are stored with the same number of bytes, depending on the largest > >> codepoint. > > > > I'm learning lots of new things in this thread! > > > > Does that mean that if I give Python a UTF8 string that is mostly single > > byte characters but contains one 4-byte character that Python will store > > the string as all 4-byte characters? > > > > If so, doesn't that introduce a pretty big storage overhead for > > large strings? > > Memory is cheap ;-) >
This is true, but sometimes memory translates into time - either direction. When the Flexible String Representation came in, it was actually an alternative to using four bytes per character on ALL strings (not just those that contain non-BMP characters), and it actually improved performance quite notably, despite some additional complications. Performance optimization is a funny science :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list