On Wed, Aug 29, 2012 at 9:40 PM, <wxjmfa...@gmail.com> wrote: > For a given coding scheme, all code points/characters are > equivalent. Expecting to handle a sub-range in a coding > scheme without shaking that coding scheme is impossible.
Not all codepoints are equally likely. That's the whole point behind variable-length encodings like Huffman compression (eg deflation as used in zip/gzip), UTF-8, quoted-printable, and Morse code. They handle a sub-range efficiently and the rest of the range less efficiently. > If a coding scheme does not give satisfaction, the only > valid solution is to create a new coding scheme, cp1252, > mac-roman, EBCDIC, ... or the interesting "TeX" case, where > the "internal" coding depends on the fonts! http://xkcd.com/927/ > This "Flexible String Representation" fails. Not only > it is unable to stick with a coding scheme, it is > a mixing of coding schemes, the worst of all possible > implementations. I propose, then, that we abolish files. Who *knows* how many different things might be represented in a file! We need a single coding scheme that can handle everything, without changing representation. This ridiculous state of affairs must not go on; the same representation can be used for bitmapped images or raw audio data! ChrisA -- http://mail.python.org/mailman/listinfo/python-list