Nikolaus Rath added the comment:

I'm about 40% done with translating Victor's patch into C. However, in the 
process I got the feeling that this approach may not be so good after all.

Note that:

 * The only use-case for set_encoding that I have found was changing the 
encoding of sys.{stdin,stdout,stderr}

 * When using non-seekable streams, set_encoding() has to be called before 
anything has been read from the stream, so it's unlikely that there is a 
situation (with the exception of sys.std*) where the stream cannot be opened 
with the right encoding instead (if you can't change the open call, then you 
probably cannot call set_encoding early enough either).

 * When using seekable streams, using set_encoding() breaks seeking, because 
the position cookie does not contain information about the decoder that was 
used at the given position. Example:

$ cat ~/tmp/test.py
import _pyio as io
data = ('0123456789\r'*5).encode('utf-16_le')
bstream = io.BytesIO(data)
tstream = io.TextIOWrapper(bstream, encoding='latin1')
tstream.readline()
pos = tstream.tell()
tstream.read(6)
tstream.set_encoding('utf-16_le')
tstream.seek(pos)

$ ./python ~/tmp/test.py 
Traceback (most recent call last):
  File "/home/nikratio/tmp/test.py", line 9, in <module>
    tstream.seek(pos)
  File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 1989, in seek
    raise OSError("can't restore logical file position")
OSError: can't restore logical file position


I don't think there is a way to fix that that would not make the whole 
tell/seek and set_encoding code even more complicated than it already is. (It 
would probably involve keeping track of the history of encoders that have been 
used for different parts of the stream).

In summary, using set_encoding() with seekable streams breaks seeking, using it 
with non-seekable streams requires it to be called right after open(), and the 
only reported case where one cannot simply change the open call instead is 
sys.std*.

Given all that, do we really want to add a new public method to the 
TextIOWrapper class that can only reasonably be used with three specific 
streams?


Personally, I think it would make much more sense to instead introduce three 
new functions in the sys module: sys.change_std{out,err,in}_encoding(). That 
solves the reported use-case just as well without polluting the namespace of 
all text streams.



That said, I am happy to complete the implementation set_encoding in C. 
However, I'd like a core developer to first reconfirm that this is really the 
better solution.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15216>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to