UTF-8 question from Dive into Python 3

carlo Mon, 17 Jan 2011 14:23:15 -0800

Hi,
recently I had to study *seriously* Unicode and encodings for one
project in Python but I left with a couple of doubts arised after
reading the unicode chapter of Dive into Python 3 book by Mark
Pilgrim.


1- Mark says:
"Also (and you’ll have to trust me on this, because I’m not going to
show you the math), due to the exact nature of the bit twiddling,
there are no byte-ordering issues. A document encoded in UTF-8 uses
the exact same stream of bytes on any computer."
Is it true UTF-8 does not have any "big-endian/little-endian" issue
because of its encoding method? And if it is true, why Mark (and
everyone does) writes about UTF-8 with and without BOM some chapters
later? What would be the BOM purpose then?

2- If that were true, can you point me to some documentation about the
math that, as Mark says, demonstrates this?

thank you
Carlo
-- 
http://mail.python.org/mailman/listinfo/python-list

UTF-8 question from Dive into Python 3

Reply via email to