Re: Unicode questions

Tobiah Tue, 19 Oct 2010 13:38:04 -0700

> There is no such thing as "plain Unicode representation". The closest
> thing would be an abstract sequence of Unicode codepoints (ala Python's
> `unicode` type), but this is way too abstract to be used for
> sharing/interchange, because storing anything in a file or sending it
> over a network ultimately involves serialization to binary, which is not
> directly defined for such an abstract representation (Indeed, this is
> exactly what encodings are: mappings between abstract codepoints and
> concrete binary; the problem is, there's more than one of them).


Ok, so the encoding is just the binary representation scheme for
a conceptual list of unicode points.  So why so many?  I get that
someone might want big-endian, and I see the various virtues of
the UTF strains, but why isn't a handful of these representations
enough?  Languages may vary widely but as far as I know, computers
really don't that much.  big/little endian is the only problem I
can think of.  A byte is a byte.  So why so many encoding schemes?
Do some provide advantages to certain human languages?

Thanks,

Toby
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode questions

Reply via email to