Nadeem Vawda added the comment:

I've tried reimplementing LZMAFile in terms of the decompress_into()
method, and it has ended up not being any faster than the existing
implementation. (It is _slightly_ faster for readinto() with a large
buffer size, but all other cases it was either of equal performance or
significantly slower.)

In addition, decompress_into() is more complicated to work with than I
had expected, so I withdraw my objection to the approach based on
max_length/unconsumed_tail.


> unconsumed_tail should be private hidden attribute, which automatically 
> prepends any consumed data.

I don't think this is a good idea. In order to have predictable memory
usage, the caller will need to ensure that the current input is fully
decompressed before passing in the next block of compressed data. This
can be done more simply with the interface used by zlib. Compare:

    while not d.eof:
        output = d.decompress(b'', 8192)
        if not output:
            compressed = f.read(8192)
            if not compressed:
                raise ValueError('End-of-stream marker not found')
            output = d.decompress(compressed, 8192)
        # <process output>

with:

    # Using zlib's interface
    while not d.eof:
        compressed = d.unconsumed_tail or f.read(8192)
        if not compressed:
            raise ValueError('End-of-stream marker not found')
        output = d.decompress(compressed, 8192)
        # <process output>


A related, but orthogonal proposal: We might want to make unconsumed_tail
a memoryview (provided the input data is know to be immutable), to avoid
creating an unnecessary copy of the data.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15955>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to