> I figured this might have something to do with it, but then again I > thought that Unicode was created as a subset of ASCII and Latin-1 so > that they would be compatible...but I guess it's never that easy. :)
The real problem is that the Python string type is used to represent two very different concepts: bytes, and characters. You can't just drop the current Python string type, and use the Unicode type instead - then you would have no good way to represent sequences of bytes anymore. Byte sequences occur more often than you might think: a ZIP file, a MS Word file, a PDF file, and even an HTTP conversation are represented through byte sequences. So for a byte sequence, internal representation is important; for a character string, it is not. Now, for historical reasons, the Python string literals create byte strings, not character strings. Since we cannot know whether a certain string literal is meant to denote bytes or characters, we can't just change the interpretation. Unicode is a superset of ASCII and Latin-1, but not of byte sequences. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list