On Dec 6, 5:36 am, Johannes Bauer <[EMAIL PROTECTED]> wrote: > So UTF-16 has an explicit EOF marker within the text? I cannot find one > in original file, only some kind of starting sequence I suppose > (0xfeff). The last characters of the file are 0x00 0x0d 0x00 0x0a, > simple \r\n line ending.
Sorry, *WRONG*. It ends in 00 0d 00 0a 00. The file is 1559 bytes long, an ODD number, which shouldn't happen with utf16. The file is stuffed. Python 3.0 has a bug; it should give a meaningful error message. Python 2.6.0 silently ignores the problem [that's a BUG] when read by a similar method: | >>> import codecs | >>> lines = codecs.open('x.txt', 'r', 'utf16').readlines() | >>> lines[-1] | u'[PhonePBK004]\r\n' Python 2.x does however give a meaningful precise error message if you try a decode on the file contents: | >>> s = open('x.txt', 'rb').read() | >>> len(s) | 1559 | >>> s[-35:] | '\x00\r\x00\n\x00[\x00P\x00h\x00o\x00n\x00e\x00P\x00B\x00K \x000\x000\x004\x00]\x00\r\x00\n\x00' | >>> u = s.decode('utf16') | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | File "C:\python26\lib\encodings\utf_16.py", line 16, in decode | return codecs.utf_16_decode(input, errors, True) | UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 1558: truncated data HTH, John -- http://mail.python.org/mailman/listinfo/python-list