On Tue, 28 Aug 2012 22:15:31 -0600, Ian Kelly wrote: > On Tue, Aug 28, 2012 at 8:42 PM, rusi <rustompm...@gmail.com> wrote:
>> How difficult would it be to giving the choice of string engine as a >> command-line flag? >> This would avoid the nuisance of having two binaries -- narrow and >> wide. > > Quite difficult. Even if we avoid having two or three separate > binaries, we would still have separate binary representations of the > string structs. It makes the maintainability of the software go down > instead of up. In fairness, there are already multiple binary representations of strings in Python 3.3: - ASCII-only strings use a 1-byte format (PyASCIIObject); - Compact Unicode objects (PyCompactObject), which if I'm reading correctly, appears to use a non-fixed width UTF-8 format, but are only used when the string length and maximum character are known ahead of time; - Legacy string objects (PyUnicodeObject), which are not compact, and which may use as their internal format: * 1-byte characters for Latin1-compatible strings; * 2-byte UCS-2 characters for strings in the Basic Multilingual Plane; * 4-byte UCS-4 characters for strings with at least one non-BMP character. http://www.python.org/dev/peps/pep-0393/#specification By my calculations, that makes *five* different internal formats for strings, at least two of which are capable of representing all Unicode characters. I don't think it would add that much additional complexity to have a runtime option --always-wide-strings to always use the UCS-4 format. For, you know, crazy people with more memory than sense. But I don't think there's any point in exposing further runtime options to choose the string representation: - neither the ASCII nor Latin1 representations can store arbitrary Unicode chars, so they're out; - the UTF-8 format is only used under restrictive circumstances, and so is (probably?) unsuitable for all strings. - the UCS-2 format can, by using surrogate pairs, but that's troublesome to get right, some might even say buggy. >> And it would give the python programmer a choice of efficiency >> profiles. > > So instead of having just one test for my Unicode-handling code, I'll > now have to run that same test *three times* -- once for each possible > string engine option. Choice isn't always a good thing. There is that too. -- Steven -- http://mail.python.org/mailman/listinfo/python-list