On Fri, Mar 29, 2013 at 3:01 AM, Terry Reedy <tjre...@udel.edu> wrote: > On 3/28/2013 10:38 AM, Chris Angelico wrote: > >> PEP393 strings have two optimizations, or kinda three: >> >> 1a) ASCII-only strings >> 1b) Latin1-only strings >> 2) BMP-only strings >> 3) Everything else >> >> Options 1a and 1b are almost identical - I'm not sure what the detail >> is, but there's something flagging those strings that fit inside seven >> bits. (Something to do with optimizing encodings later?) > > > Yes. 'Encoding' an ascii-only string to any ascii-compatible encoding > amounts to a simple copy of the internal bytes. I do not know if *all* the > codecs for such encodings are 393-aware, but I do know that the utf-8 and > latin-1 group are. This is one operation that 3.3+ does much faster than > 3.2-
Thanks Terry. So that's not so much a representation difference as a flag that costs little or nothing to retain, and can improve performance in the encode later on. Sounds like a useful tweak to the basics of flexible string representation, without being particularly germane to jmf's complaints. ChrisA -- http://mail.python.org/mailman/listinfo/python-list