New submission from Amaury Forgeot d'Arc <amaur...@gmail.com>: First write a utf-16 file with its signature:
>>> f1 = open('utf16.txt', 'w', encoding='utf-16') >>> f1.write('0123456789') >>> f1.close() Then read it twice: >>> f2 = open('utf16.txt', 'r', encoding='utf-16') >>> print('read1', ascii(f2.read())) read1 '0123456789' >>> f2.seek(0) 0 >>> print('read2', ascii(f2.read())) read2 '\ufeff0123456789' The second read returns the BOM! This is because the zero in seek(0) is a "cookie" which contains both the position and the decoder state. Unfortunately, state=0 means 'endianness has been determined: native order'. maybe a suggestion: handle seek(0) as a special value which calls decoder.reset(). The patch implement this idea. ---------- files: io_utf16.patch keywords: patch messages: 79299 nosy: amaury.forgeotdarc priority: critical severity: normal status: open title: utf-16 BOM is not skipped after seek(0) versions: Python 3.0 Added file: http://bugs.python.org/file12627/io_utf16.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4862> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com