On Sun, Mar 20, 2016 at 11:14 PM, Steven D'Aprano <st...@pearwood.info> wrote: >>> On the other hand, I believe that the output of the UTF transformations >>> is explicitly described in terms of 8-bit bytes and 16- or 32-bit words. >>> For instance, the UTF-8 encoding of "A" has to be a single byte with >>> value 0x41 (decimal 65). It isn't that this is the most obvious >>> implementation, its that it can't be anything else and still be UTF-8. >> >> Exactly. Aside from the way UTF-16 and UTF-32 have LE and BE variants, > > Blame the chip manufacturers for that. Actually, I think we can blame Intel > specifically for that, for reversing the normal layout of words in memory.
No, I disagree; it's inherent in the notion of representing a 16-bit or 32-bit value across bytes. Maybe there could have been one most-common standard, but there'd still have been another way of doing it. Little-endianness and big-endianness are important enough to have to deal with. ChrisA -- https://mail.python.org/mailman/listinfo/python-list