random...@fastmail.us wrote: > On Fri, Nov 21, 2014, at 23:38, Steven D'Aprano wrote: >> I really don't understand what bothers you about this. In Python, we have >> Unicode strings and byte strings. In computing in general, strings can >> consist of Unicode characters, ASCII characters, Tron characters, EBCDID >> characters, ISO-8859-7 characters, and literally dozens of others. It >> boogles my mind that you are so opposed to being explicit about what sort >> of string we are dealing with. > > I think he means that it should be implementation-defined with an API > that does not allow programs to make assumptions about the encoding, > like C. To allow for implementations that use a different character set.
Python is not C, and doesn't make every second thing undefined behaviour. If Python treated the character set as an implementation detail, the programmer would have no way of knowing whether s = u"ö" is legal or not, since you cannot know whether or not ö is a supported character in the running Python. It might work on your system, and fail for other people. That is worse than the old distinction between "narrow" and "wide" builds. It would be a lazy and stupid design, and especially stupid since there really in no good alternative to Unicode today. ASCII is not even sufficient for American English, the whole Windows code page idea is a horrible mess, none of the legacy encodings are suitable for more than a tiny fraction of the world. -- Steven -- https://mail.python.org/mailman/listinfo/python-list