i am using python 2.6 on a linux box and i have some utf-16 encoded files with crlf line-endings which i would like to open with universal newlines.

so far, i have been unable to get this to work correctly.

for example:

>>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16'))
>>> repr(open('test.txt', 'rbU').read().decode('utf-16'))
"u'a\\n\\nb\\n\\n'"
>>> import codecs
>>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read())
"u'a\\n\\nb\\n\\n'"

of course, the output i want is:

"u'a\\nb\\n'"

i suppose it's not too surprising that the built-in open converts the line endings before decoding, but it surprised me that codecs.open does this as well.

is there a way to get universal newlines to work properly with utf-16 files?

(nb: i'm not interested in other methods of converting line endings - just whether universal newlines can be made to work correctly).
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to