Re: Python 3.0 automatic decoding of UTF16

MRAB Sat, 06 Dec 2008 08:51:02 -0800

Johannes Bauer wrote:

[EMAIL PROTECTED] schrieb:

2 problems: endianness and trailing zer byte.
This works for me:


This is very strange - when using "utf16", endianness should be detected
automatically. When I simply truncate the trailing zero byte, I receive:

Traceback (most recent call last):
  File "./modify.py", line 12, in <module>
    a = AddressBook("2008_11_05_Handy_Backup.txt")
  File "./modify.py", line 7, in __init__
    line = f.readline()
  File "/usr/local/lib/python3.0/io.py", line 1807, in readline
    while self._read_chunk():
  File "/usr/local/lib/python3.0/io.py", line 1556, in _read_chunk
    self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
  File "/usr/local/lib/python3.0/io.py", line 1293, in decode
    output = self.decoder.decode(input, final=final)
  File "/usr/local/lib/python3.0/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/usr/local/lib/python3.0/encodings/utf_16.py", line 69, in
_buffer_decode
    return self.decoder(input, self.errors, final)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x0a in position 0:
truncated data

But I suppose something *is* indeed weird because the file I uploaded
and which did not yield the "truncated data" error ia 1559 bytes, which
just cannot be.

It might be that the EOF marker (b'\x1A' or u'\u001A') was written or isbeing read as a single byte instead of 2 bytes for UTF-16 text.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 automatic decoding of UTF16

Reply via email to