Tobiah wrote: > I've been reading about the Unicode today. > I'm only vaguely understanding what it is > and how it works. > > Please correct my understanding where it is lacking. > Unicode is really just a database of character information > such as the name, unicode section, possible > numeric value etc. These points of information > are indexed by standard, never changing numeric > indexes, so that 0x2CF might point to some > character information set, that all the world > can agree on. The actual image that gets > displayed in response to the integer is generally > assigned and agreed upon, but it is up to the > software responding to the unicode value to define > and generate the actual image that will represent that > character.
Correct. The "actual images" are called glyphs in Unicode-speak. > Now for the mysterious encodings. There is the UTF-{8,16,32} > which only seem to indicate what the binary representation > of the unicode character points is going to be. Then there > are 100 or so other encoding, many of which are language > specific. ASCII encoding happens to be a 1-1 mapping up > to 127, but then there are others for various languages etc. > I was thinking maybe this special case and the others were lookup > mappings, where a > particular language user could work with characters perhaps > in the range of 0-255 like we do for ASCII, but then when > decoding, to share with others, the plain unicode representation > would be shared? Why can't we just say "unicode is unicode" > and just share files the way ASCII users do. Just have a huge > ASCII style table that everyone sticks to. Please enlighten > my vague and probably ill-formed conception of this whole thing. UTF-n are transfer encodings of the Unicode table (the one you are probably referring to). They represent the same code points, but using different trade-offs. If you're looking for a short intro to Unicode in Python, have a look at these talks I've given on the subject: http://www.egenix.com/library/presentations/#PythonAndUnicode http://www.egenix.com/library/presentations/#DesigningUnicodeAwareApplicationsInPython -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 20 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list