pekka niiranen wrote:
I have two files "my.utf8" and "my.utf16" which
both contain BOM and two "a" characters.
Contents of "my.utf8" in HEX:
EFBBBF6161
Contents of "my.utf16" in HEX:
FEFF6161
This is not true: this byte string does not denote
two "a" characters. Instead, it is a single character
U+6161.
Is there a trick to read UTF8 encoded file with BOM not decoded?
It's very easy: just drop the first character if it is the BOM.
The UTF-8 codec will never do this on its own.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list