On 17 Gen, 23:34, Antoine Pitrou <solip...@pitrou.net> wrote: > On Mon, 17 Jan 2011 14:19:13 -0800 (PST) > > carlo <syseng...@gmail.com> wrote: > > Is it true UTF-8 does not have any "big-endian/little-endian" issue > > because of its encoding method? > > Yes. > > > And if it is true, why Mark (and > > everyone does) writes about UTF-8 with and without BOM some chapters > > later? What would be the BOM purpose then? > > "BOM" in this case is a misnomer. For UTF-8, it is only used as a > marker (a magic number, if you like) to signal than a given text file > is UTF-8. The UTF-8 "BOM" does not say anything about byte order; and, > actually, it does not change with endianness. > > (note that it is not required to put an UTF-8 "BOM" at the beginning of > text files; it is just a hint that some tools use when > generating/reading UTF-8) > > > 2- If that were true, can you point me to some documentation about the > > math that, as Mark says, demonstrates this? > > Math? UTF-8 is simply a byte-oriented (rather than word-oriented) > encoding. There is no math involved, it just works by construction. > > Regards > > Antoine.
thank you all, eventually found http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404 which clears up. No math in fact, as Tim and Antoine pointed out. -- http://mail.python.org/mailman/listinfo/python-list