Nick Coghlan added the comment: Reviewing Inada-san's latest version of the patch, we seem to be in a somewhat hybrid state where:
1. The restriction to only being used with seekable() streams if there is currently unread data in the read buffer is in place 2. We don't actually call seek() anywhere to set the stream back to the beginning of the file. Instead, we try to shuffle data out of the old decoder and into the new one. I'm starting to wonder if the best option here might be to attempt to make the API work for arbitrary codecs and non-seekable streams, and then simply accept that it may take a few maintenance releases before that's actually true. If we decide to go down that path, then I'd suggest the follow stress test: - make a longish test string out of repeated copies of "ℙƴ☂ℌøἤ" - pick a few pairs of multibyte non-universal/universal encodings for use with surrogateescape and strict as their respective error handlers (e.g. ascii/utf8, ascii/utf16le, ascii/utf32, ascii/shift_jis, ascii/iso2022_jp, ascii/gb18030, gbk/gb18030) - for each pair, make the test data by encoding from str to bytes with the relevant universal encoding - switch the encoding multiple times on the same stream at different points Optionally: - extract "codecs._switch_decoder" and "codecs._switch_encoder" helper functions to make this all a bit easier to test and debug (with a Python version in the codecs module and the C version accessible via the _codecs modules) That way, confidence in the reliability of the feature (including across Python implementations) can be based on the strength of the test cases covering it. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15216> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com