Nadeem Vawda added the comment: After some consideration, I've come to agree with Serhiy that it would be better to keep a private internal buffer, rather than having the user manage unconsumed input data. I'm also in favor of having a flag to indicate whether the decompressor needs more input to produce more decompressed data. (I'd prefer to call it 'needs_input' or similar, though - 'data_ready' feels too vague to me.)
In msg176883 and msg177228, Serhiy raises the possibility that the compressor might be unable to produce decompressed output from a given piece of (non-empty) input, but will still leave the input unconsumed. I do not think that this can actually happen (based on the libraries' documentation), but this API will work even if that situation can occur. So, to summarize, the API will look like this: class LZMADecompressor: ... def decompress(self, data, max_length=-1): """Decompresses *data*, returning uncompressed data as bytes. If *max_length* is nonnegative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, *self.needs_input* will be set to False. In this case, the next call to *decompress()* should provide *data* as b'' to obtain more of the output. If all of the input data was decompressed and returned (either because this was less than *max_length* bytes, or because *max_length* was negative), *self.needs_input* will be set to True. """ ... Data not consumed due to the use of 'max_length' should be saved in an internal buffer (that is not exposed to Python code at all), which is then prepended to any data provided in the next call to decompress() before providing the data to the underlying compression library. The cases where either the internal buffer or the new data are empty should be optimized to avoid unnecessary allocations or copies, since these will be the most common cases. Note that this API does not need a Python-level 'unconsumed_tail' attribute - its role is served by the internal buffer (which is private to the C module implementation). This is not to be confused with the already-existing 'unused_data' attribute that stores data found after the end of the compressed stream. 'unused_data' should continue to work as before, regardless of whether decompress() is called with a max_length argument or not. As a starting point I would suggest writing a patch for LZMADecompressor first, since its implementation is a bit simpler than BZ2Decompressor. Once this patch and an analogous one for BZ2Decompressor have been committed, we can then convert GzipFile, BZ2File and LZMAFile to use this feature. If you have any questions while you're working on this issue, feel free to send them my way. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15955> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com