On Fri, 17 Aug 2012 11:45:02 -0700, wxjmfauth wrote: > Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit : >> On Fri, Aug 17, 2012 at 1:49 PM, <wxjmfa...@gmail.com> wrote: >> >> > The character '…', Unicode name 'HORIZONTAL ELLIPSIS', >> > is one of these characters existing in the cp1252, mac-roman >> > coding schemes and not in iso-8859-1 (latin-1) and obviously >> > not in ascii. It causes Py3.3 to work a few 100% slower >> > than Py<3.3 versions due to the flexible string representation >> > (ascii/latin-1/ucs-2/ucs-4) (I found cases up to 1000%). [...] > Sorry, you missed the point. > > My comment had nothing to do with the code source coding, the coding of > a Python "string" in the code source or with the display of a Python3 > <str>. > I wrote about the *internal* Python "coding", the way Python keeps > "strings" in memory. See PEP 393.
The PEP does not support your claim that flexible string storage is 100% to 1000% slower. It claims 1% - 30% slowdown, with a saving of up to 60% of the memory used for strings. I don't really understand what message you are trying to give here. Are you saying that PEP 393 is a good thing or a bad thing? In Python 1.x, there was no support for Unicode at all. You could only work with pure byte strings. Support for non-ascii characters like … ∞ é ñ £ π Ж ش was purely by accident -- if your terminal happened to be set to an encoding that supported a character, and you happened to use the appropriate byte value, you might see the character you wanted. In Python 2.2, Python gained support for Unicode. You could now guarantee support for any Unicode character in the Basic Multilingual Plane (BMP) by writing your strings using the u"..." style. In Python 3, you no longer need the leading U, all strings are unicode. But there is a problem: if your Python interpreter is a "narrow build", it *only* supports Unicode characters in the BMP. When Python is a "wide build", compiled with support for the additional character planes, then strings take much more memory, even if they are in the BMP, or are simple ASCII strings. PEP 393 fixes this problem and gets rid of the distinction between narrow and wide builds. From Python 3.3 onwards, all Python compilers will have the same support for unicode, rather than most being BMP-only. Each individual string's internal storage will use only as many bytes-per- character as needed to store the largest character in the string. This will save a lot of memory for those using mostly ASCII or Latin-1 but a few multibyte characters. While the increased complexity causes a small slowdown, the increased functionality makes it well worthwhile. -- Steven -- http://mail.python.org/mailman/listinfo/python-list