Hi there,
I have two files "my.utf8" and "my.utf16" which both contain BOM and two "a" characters.
Contents of "my.utf8" in HEX: EFBBBF6161
Contents of "my.utf16" in HEX: FEFF6161
For some reason Python2.4 decodes the BOM for UTF8 but not for UTF16. See below:
>>> fh = codecs.open("my.uft8", "rb", "utf8") >>> fh.readlines() [u'\ufeffaa'] # BOM is decoded, why >>> fh.close() >>> fh = codecs.open("my.utf16", "rb", "utf16") >>> fh.readlines() [u'\u6161'] # No BOM here >>> fh.close()
Is there a trick to read UTF8 encoded file with BOM not decoded?
-pekka- -- http://mail.python.org/mailman/listinfo/python-list