Marc-Andre Lemburg <m...@egenix.com> added the comment: On 2009-01-07 01:21, Amaury Forgeot d'Arc wrote: > First write a utf-16 file with its signature: > >>>> f1 = open('utf16.txt', 'w', encoding='utf-16') >>>> f1.write('0123456789') >>>> f1.close() > > Then read it twice: > >>>> f2 = open('utf16.txt', 'r', encoding='utf-16') >>>> print('read1', ascii(f2.read())) > read1 '0123456789' >>>> f2.seek(0) > 0 >>>> print('read2', ascii(f2.read())) > read2 '\ufeff0123456789' > > The second read returns the BOM! > This is because the zero in seek(0) is a "cookie" which contains both the > position > and the decoder state. Unfortunately, state=0 means 'endianness has been > determined: > native order'. > > maybe a suggestion: handle seek(0) as a special value which calls > decoder.reset(). > The patch implement this idea.
This is a problem with the utf_16.py codec, not the io layer. Opening a file in append mode is something that the io layer would have to handle, since the codec doesn't know anything about the underlying file mode. Using .reset() will not help. The code for the StreamReader and StreamWriter in utf_16.py will have to be modified to undo the adjustment of the .encode() and .decode() method after using .seek(0). Note that there's also the case .seek(1) - I guess this must be considered as resulting in undefined behavior. ---------- nosy: +lemburg _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4862> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com