On Fri, 15 May 2015 08:52 pm, Marko Rauhamaa wrote: > wxjmfa...@gmail.com: > >> Le vendredi 15 mai 2015 11:20:25 UTC+2, Marko Rauhamaa a écrit : >>> wxjmfa...@gmail.com: >>> >>> > Implement unicode correctly. >>> Did they reject your patch? >> >> You can not patch something that is wrong by design. > > Are you saying the Python language spec is unfixable or that the CPython > implementation is unfixable?
JMF is obsessed with a trivial and artificial performance regression in the handling of Unicode strings since Python 3.3, which introduced a significant memory optimization for Unicode strings. Each individual string uses a code unit no larger than necessary, thus if a string contains nothing but ASCII or Latin 1 characters, it will use one byte per character; if it fits into the Basic Multilingual Plane, two bytes per character; and only use four bytes per character if there are "astral" characters in the string. (That is, Python strings select from a Latin-1, UCS-2 and UTF-32 encoded form at creation time, according to the largest code point in the string.) The benefit of this is that most strings will use 1/2 or 1/4 of the memory that they otherwise would need, which gives an impressive memory saving. That leads to demonstrable speed-ups in real-world code, however it is possible to find artificial benchmarks that experience a slowdown compared to Python 3.2. JMF found one such artificial benchmark, involving creating and throwing away many strings as fast as possible without doing any work with them, and from this has built this fantasy in his head that Python is not compliant with the Unicode spec and is logically, mathematically broken. -- Steven -- https://mail.python.org/mailman/listinfo/python-list