On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico <ros...@gmail.com> wrote: > PEP393 strings have two optimizations, or kinda three: > > 1a) ASCII-only strings > 1b) Latin1-only strings > 2) BMP-only strings > 3) Everything else > > Options 1a and 1b are almost identical - I'm not sure what the detail > is, but there's something flagging those strings that fit inside seven > bits. (Something to do with optimizing encodings later?) Both are > optimized down to a single byte per character.
The only difference for ASCII-only strings is that they are kept in a struct with a smaller header. The smaller header omits the utf8 pointer (which optionally points to an additional UTF-8 representation of the string) and its associated length variable. These are not needed for ASCII-only strings because an ASCII string can be directly interpreted as a UTF-8 string for the same result. The smaller header also omits the "wstr_length" field which, according to the PEP, "differs from length only if there are surrogate pairs in the representation." For an ASCII string, of course there would not be any surrogate pairs. -- http://mail.python.org/mailman/listinfo/python-list