John Machin <[EMAIL PROTECTED]> wrote: > UTF-32 is yet another encoding. [...] > Once you have done codecs.open('inputfile', 'rb', 'utf_32') or > receivedstring.decode('utf_32'), what do you care whether your > *external representation* has fixed-width characters or not?
> Putting it another way, any advantage of fixed-width characters is to > be found in *internal* storage, not *external* transmission or > storage. > At the other end, if you don't have to squeeze your data through an > 8-bit-wide non-binary channel, and you have no need for legibility to > humans, then the remaining considerations are efficiency and (if you > have no control over what's used at the other end) whether the > necessary codec is widely implemented. So, are you saying that any encoding that handles all the needed characters are equally good choices? So why not choose UTF-7? Or Punycode? Should you never care what the black box you are using looks like on the inside? Hadn't it mattered if X.400 won over SMTP? Both protocols are somewhat capable of sending emails after all; X.400 is just a bit more complicated on the inside where normal users don't see. Fixed-with characters *do* have advantages, even in the external representation. With fixed-with characters you don't have to parse the entire file or stream in order to read the Nth character; instead you can skip or seek to an octet position that can be calculated directly from N. In-place editing of single characters in large files becomes more efficient. The codec for UTF-32 is extremely simple. There are no illegal sequences to care about, like there are in UTF-8 and UTF-16, just illegal single 32-bit values (those that are larger than 0x10ffff). And not the least, UTF-32 is *beautiful* compared to UTF-16. -- Thomas Bellman, Lysator Computer Club, Linköping University, Sweden "Adde parvum parvo magnus acervus erit" ! bellman @ lysator.liu.se (From The Mythical Man-Month) ! Make Love -- Nicht Wahr!
-- http://mail.python.org/mailman/listinfo/python-list