On 28 mar, 15:38, Chris Angelico <ros...@gmail.com> wrote: > On Fri, Mar 29, 2013 at 1:12 AM, jmfauth <wxjmfa...@gmail.com> wrote: > > This flexible string representation is so absurd that not only > > "it" does not know you can not write Western European Languages > > with latin-1, "it" penalizes you by just attempting to optimize > > latin-1. Shown in my multiple examples. > > PEP393 strings have two optimizations, or kinda three: > > 1a) ASCII-only strings > 1b) Latin1-only strings > 2) BMP-only strings > 3) Everything else > > Options 1a and 1b are almost identical - I'm not sure what the detail > is, but there's something flagging those strings that fit inside seven > bits. (Something to do with optimizing encodings later?) Both are > optimized down to a single byte per character. > > Option 2 is optimized to two bytes per character. > > Option 3 is stored in UTF-32. > > Once again, jmf, you are forgetting that option 2 is a safe and > bug-free optimization. > > ChrisA
As long as you are attempting to devide a set of characters in chunks and try to handle them seperately, it will never work. Read my previous post about the unicode transformation format. I know what pep393 does. jmf -- http://mail.python.org/mailman/listinfo/python-list