> There is no such thing as "plain Unicode representation". The closest > thing would be an abstract sequence of Unicode codepoints (ala Python's > `unicode` type), but this is way too abstract to be used for > sharing/interchange, because storing anything in a file or sending it > over a network ultimately involves serialization to binary, which is not > directly defined for such an abstract representation (Indeed, this is > exactly what encodings are: mappings between abstract codepoints and > concrete binary; the problem is, there's more than one of them).
Ok, so the encoding is just the binary representation scheme for a conceptual list of unicode points. So why so many? I get that someone might want big-endian, and I see the various virtues of the UTF strains, but why isn't a handful of these representations enough? Languages may vary widely but as far as I know, computers really don't that much. big/little endian is the only problem I can think of. A byte is a byte. So why so many encoding schemes? Do some provide advantages to certain human languages? Thanks, Toby -- http://mail.python.org/mailman/listinfo/python-list