On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : >> [...] >> The problem with UCS-4 is that every character requires four bytes. >> [...] > > I'm aware of this (and all the blah blah blah you are explaining). This > always the same song. Memory.
Exactly. The reason it is always the same song is because it is an important song. > Let me ask. Is Python an 'american" product for us-users or is it a tool > for everybody [*]? It is a product for everyone, which is exactly why PEP 393 is so important. PEP 393 means that users who have only a few non-BMP characters don't have to pay the cost of UCS-4 for every single string in their application, only for the ones that actually require it. PEP 393 means that using Unicode strings is now cheaper for everybody. You seem to be arguing that the way forward is not to make Unicode cheaper for everyone, but to make ASCII strings more expensive so that everyone suffers equally. I reject that idea. > Is there any reason why non ascii users are somehow penalized compared > to ascii users? Of course there is a reason. If you want to represent 1114111 different characters in a string, as Unicode supports, you can't use a single byte per character, or even two bytes. That is a fact of basic mathematics. Supporting 1114111 characters must be more expensive than supporting 128 of them. But why should you carry the cost of 4-bytes per character just because someday you *might* need a non-BMP character? > This flexible string representation is a regression (ascii users or > not). No it is not. It is a great step forward to more efficient Unicode. And it means that now Python can correctly deal with non-BMP characters without the nonsense of UTF-16 surrogates: steve@runes:~$ python3.3 -c "print(len(chr(1114000)))" # Right! 1 steve@runes:~$ python3.2 -c "print(len(chr(1114000)))" # Wrong! 2 without doubling the storage of every string. This is an important step towards making the full range of Unicode available more widely. > I recognize in practice the real impact is for many users closed to zero Then what's the problem? > (including me) but I have shown (I think) that this flexible > representation is, by design, not as optimal as it is supposed to be. You have not shown any real problem at all. You have shown untrustworthy, edited timing results that don't match what other people are reporting. Even if your timing results are genuine, you haven't shown that they make any difference for real code that does useful work. -- Steven -- http://mail.python.org/mailman/listinfo/python-list