On Monday, June 10, 2013 3:48:08 PM UTC-4, jmfauth wrote: > ----- > > > > A coding scheme works with three sets. A *unique* set > of CHARACTERS, a *unique* set of CODE POINTS and a *unique* > set of ENCODED CODE POINTS, unicode or not. > > The relation between the set of characters and the set of the > code points is a *human* table, created with a sheet of paper > and a pencil, a deliberate choice of characters with integers > as "labels". > > The relation between the set of the code points and the > set of encoded code points is a "mathematical" operation. > > In the case of an "8bits" coding scheme, like iso-XXX, > this operation is a no-op, the relation is an identity. > Shortly: set of code points == set of encoded code points. > > In the case of unicode, The Unicode consortium endorses > three such mathematical operations called UTF-8, UTF-16 and > UTF-32 where UTF means Unicode Transformation Format, a > confusing wording meaning at the same time, the process > and the result of the process. This Unicode Transformation does > not produce bytes, it produces words/chunks/tokens of *bits* with > lengths 8, 16, 32, called Unicode Transformation Units (from this > the names UTF-8, -16, -32). At this level, only a structure has > been defined (there is no computing).
This is a really good description of the issues involved with character sets and encodings, thanks. > Very important, an healthy > coding scheme works conceptually only with this *unique" set > of encoded code points, not with bytes, characters or code points. > You don't explain why it is important to work with encoded code points. What's wrong with working with code points? > > The last step, the machine implementation: it is up to the > processor, the compiler, the language to implement all these > Unicode Transformation Units with of course their related > specifities: char, w_char, int, long, endianess, rune (Go > language), ... > > Not too over-simplified or not too over-complicated and enough > to understand one, if not THE, design mistake of the flexible > string representation. > > jmf Again you've made the claim that the flexible string representation is a mistake. But you haven't said WHY. I can't tell if you are trolling us, or are deluded, or genuinely don't understand what you are talking about. Some day you might explain yourself. I look forward to it. --Ned. -- http://mail.python.org/mailman/listinfo/python-list