I've been reading about the Unicode today. I'm only vaguely understanding what it is and how it works.
Please correct my understanding where it is lacking. Unicode is really just a database of character information such as the name, unicode section, possible numeric value etc. These points of information are indexed by standard, never changing numeric indexes, so that 0x2CF might point to some character information set, that all the world can agree on. The actual image that gets displayed in response to the integer is generally assigned and agreed upon, but it is up to the software responding to the unicode value to define and generate the actual image that will represent that character. Now for the mysterious encodings. There is the UTF-{8,16,32} which only seem to indicate what the binary representation of the unicode character points is going to be. Then there are 100 or so other encoding, many of which are language specific. ASCII encoding happens to be a 1-1 mapping up to 127, but then there are others for various languages etc. I was thinking maybe this special case and the others were lookup mappings, where a particular language user could work with characters perhaps in the range of 0-255 like we do for ASCII, but then when decoding, to share with others, the plain unicode representation would be shared? Why can't we just say "unicode is unicode" and just share files the way ASCII users do. Just have a huge ASCII style table that everyone sticks to. Please enlighten my vague and probably ill-formed conception of this whole thing. Thanks, Tobiah -- http://mail.python.org/mailman/listinfo/python-list