On Fri, 26 Jul 2013 08:46:58 -0700, wxjmfauth wrote: > BTW, I'm pleased to read "sequence of bits" and not bytes. Again, utf > transformers are producing sequence of bits, call Unicode Transformation > Units, with lengths of 8/16/32 *bits*, from there the names utf8/16/32. > UCS transformers are (were) producing bytes, from there the names > ucs-2/4.
Not only does your distinction between bits and bytes make no practical difference on nearly all hardware in common use today[1], but the Unicode Consortium disagrees with you, and defines UTC in terms of bytes: "A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence." http://www.unicode.org/faq/utf_bom.html#gen2 [1] There may still be some old supercomputers where a byte is more than 8 bits in use, but they're unlikely to support Unicode. -- Steven -- http://mail.python.org/mailman/listinfo/python-list