On Sat, Aug 18, 2012 at 9:07 AM, <wxjmfa...@gmail.com> wrote: > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : >> [...] >> The problem with UCS-4 is that every character requires four bytes. >> [...] > > I'm aware of this (and all the blah blah blah you are > explaining). This always the same song. Memory. > > Let me ask. Is Python an 'american" product for us-users > or is it a tool for everybody [*]? > Is there any reason why non ascii users are somehow penalized > compared to ascii users?
The change does not just benefit ASCII users. It primarily benefits anybody using a wide unicode build with strings mostly containing only BMP characters. Even for narrow build users, there is the benefit that with approximately the same amount of memory usage in most cases, they no longer have to worry about non-BMP characters sneaking in and breaking their code. There is some additional benefit for Latin-1 users, but this has nothing to do with Python. If Python is going to have the option of a 1-byte representation (and as long as we have the flexible representation, I can see no reason not to), then it is going to be Latin-1 by definition, because that's what 1-byte Unicode (UCS-1, if you will) is. If you have an issue with that, take it up with the designers of Unicode. > > This flexible string representation is a regression (ascii users > or not). > > I recognize in practice the real impact is for many users > closed to zero (including me) but I have shown (I think) that > this flexible representation is, by design, not as optimal > as it is supposed to be. This is in my mind the relevant point. You've shown nothing of the sort. You've demonstrated only one out of many possible benchmarks, and other users on this list can't even reproduce that. -- http://mail.python.org/mailman/listinfo/python-list