Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code 1============================ for i in open("d:\python24\lzjtest.xml", 'r').readlines(): i = i.decode("utf-16") print i ======================================= Output was: Traceback (most recent call last): File "D:\Python24\testutf-16.py", line 4, in -toplevel- i = i.decode("utf-16") File "D:\Python24\lib\encodings\utf_16.py", line 16, in decode return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode byte 0x0a in position 84: truncated data
I searched google and found an article on the similar problem saying to use split(). I had not quite caught the meaning of the article and recode as: ==== Code 2============================== for i in open("d:\python24\lzjtest.xml", 'r').read().split('\r\n'): i = i.decode("utf-16") print i ======================================= Then it worked (echo the file). Later I get to know codecs and write the following code: ==== Code 3 ============================= import codecs for i in codecs.open("d:\python24\lzjtesttvs2.xml", 'r', 'utf-16').readlines(): print i ======================================= It worked and echo the file. I am wondering what is the problem with the first code and why the bug is fixed in the second. Thanks in advance. -Zhongjian -- http://mail.python.org/mailman/listinfo/python-list