On Wed, Dec 19, 2012 at 8:40 AM, Chris Angelico <ros...@gmail.com> wrote: > You may not be familiar with jmf. He's one of our resident trolls, and > he has a bee in his bonnet about PEP 393 strings, on the basis that > they take up more space in memory than a narrow build of Python 3.2 > would, for a string with lots of BMP characters and one non-BMP. In > 3.2 narrow builds, strings were stored in UTF-16, with *surrogate > pairs* for non-BMP characters. This means that len() counts them > twice, as does string indexing/slicing. That's a major bug, especially > as your Python code will do different things on different platforms - > most Linux builds of 3.2 are "wide" builds, storing characters in four > bytes each.
>From what I've been able to discern, his actual complaint about PEP 393 stems from misguided moral concerns. With PEP-393, strings that can be fully represented in Latin-1 can be stored in half the space (ignoring fixed overhead) compared to strings containing at least one non-Latin-1 character. jmf thinks this optimization is unfair to non-English users and immoral; he wants Latin-1 strings to be treated exactly like non-Latin-1 strings (I don't think he actually cares about non-BMP strings at all; if narrow-build Unicode is good enough for him, then it must be good enough for everybody). Unfortunately for him, the Latin-1 optimization is rather trivial in the wider context of PEP-393, and simply removing that part alone clearly wouldn't be doing anybody any favors. So for him to get what he wants, the entire PEP has to go. It's rather like trying to solve the problem of wealth disparity by forcing everyone to dump their excess wealth into the ocean. -- http://mail.python.org/mailman/listinfo/python-list