Baz Walter, 11.04.2010 16:12:
i am using python 2.6 on a linux box and i have some utf-16 encoded
files with crlf line-endings which i would like to open with universal
newlines.
so far, i have been unable to get this to work correctly.
for example:
>>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16'))
>>> repr(open('test.txt', 'rbU').read().decode('utf-16'))
"u'a\\n\\nb\\n\\n'"
>>> import codecs
>>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read())
"u'a\\n\\nb\\n\\n'"
of course, the output i want is:
"u'a\\nb\\n'"
i suppose it's not too surprising that the built-in open converts the
line endings before decoding, but it surprised me that codecs.open does
this as well.
The codecs module does not support universal newline parsing (see the
docs). You need to use the new io module instead.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list