[issue20132] Many incremental codecs don’t handle fragmented data

Martin Panter Thu, 15 Jan 2015 16:28:44 -0800

Martin Panter added the comment:

My “master plan” is basically to make most of the bytes-to-bytes codecs work as 
documented in the incremental (stateful) modes. I’m less interested in fixing 
the text codecs, and the quopri and uu codecs might be too hard, so I was going 
to propose some documentation warnings for those.


If you have a suggestion on how to go about this better, let me know.

With my doc change to StreamReader, I wanted to document the different modes 
that I saw in the base codecs.StreamReader.read() implementation:

* read() or read(-1) reads everything
* read(size) returns an arbitrary amount of data
* read(size, chars) returns exactly *chars* length of data (unless EOF or 
similar)

Previously the case of read(-1, chars) was ambiguous. Also I did not find the 
description “an approximate maximum number of decoded bytes” very helpful, 
considering more than this maximum was read if necessary to get enough *chars*.

Regarding the end-of-stream behaviour, I made an assumption but I now realize 
it was wrong. Experimenting with the UTF-8 codec shows that its 
StreamReader.read() keeps returning "" when the underlying stream returns no 
data. But if it was in the middle of a multi-byte sequence, no “end of data” 
error is raised, and the multi-byte sequence can be completed if the underlying 
stream later returns more data. I think the lack of end-of-data checking should 
be documented.

The different cases of ValueError and UnicodeError that you describe make 
sense. I think the various references to ValueError and UnicodeError should be 
updated (or replaced with pointers) to match.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue20132>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20132] Many incremental codecs don’t handle fragmented data

Reply via email to