Re: Is this a bug? BOM decoded with UTF8

Brian Quinlan Fri, 11 Feb 2005 05:51:41 -0800

Diez B. Roggisch wrote:

I know its easy (string.replace()) but why does UTF-16 do
it on its own then? Is that according to Unicode standard or just
Python convention?

BOM is microsoft-proprietary crap. UTF-16 is defined in the unicode
standard.

What are you talking about? The BOM and UTF-16 go hand-and-hand. Without a Byte Order Mark, you can't unambiguosly determine whether big or little endian UTF-16 was used. If, for example, you came across a UTF-16 text file containing this hexidecimal data: 2200

what would you assume? That is is quote character in little-endian format or that it is a for-all symbol in big-endian format?

For more details, see:
http://www.unicode.org/faq/utf_bom.html#BOM

Cheers,
Brian
--
http://mail.python.org/mailman/listinfo/python-list

Re: Is this a bug? BOM decoded with UTF8

Reply via email to