Nick Coghlan added the comment:

Reviewing Inada-san's latest version of the patch, we seem to be in a somewhat 
hybrid state where:

1. The restriction to only being used with seekable() streams if there is 
currently unread data in the read buffer is in place

2. We don't actually call seek() anywhere to set the stream back to the 
beginning of the file. Instead, we try to shuffle data out of the old decoder 
and into the new one.

I'm starting to wonder if the best option here might be to attempt to make the 
API work for arbitrary codecs and non-seekable streams, and then simply accept 
that it may take a few maintenance releases before that's actually true. If we 
decide to go down that path, then I'd suggest the follow stress test:

- make a longish test string out of repeated copies of "ℙƴ☂ℌøἤ"
- pick a few pairs of multibyte non-universal/universal encodings for use with 
surrogateescape and strict as their respective error handlers (e.g. ascii/utf8, 
ascii/utf16le, ascii/utf32, ascii/shift_jis, ascii/iso2022_jp, ascii/gb18030, 
gbk/gb18030)
- for each pair, make the test data by encoding from str to bytes with the 
relevant universal encoding
- switch the encoding multiple times on the same stream at different points

Optionally:

- extract "codecs._switch_decoder" and "codecs._switch_encoder" helper 
functions to make this all a bit easier to test and debug (with a Python 
version in the codecs module and the C version accessible via the _codecs 
modules)

That way, confidence in the reliability of the feature (including across Python 
implementations) can be based on the strength of the test cases covering it.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15216>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to