[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

Martin Panter Thu, 26 May 2016 19:25:03 -0700

Martin Panter added the comment:

This is similar, but different to the other bug. The other bug was only about 
output limits for incrementally decompressed data. Klamann’s bug is about the 
actual size of input (and possibly also output) buffers.


The gzip.compress() implementation uses zlib.compressobj.compress(), which does 
not accept 2 or 4 GiB input either.

The underlying zlib library uses “unsigned int” for the size of input and 
output chunks. It has to be called multiple times to handle 4 GiB. In both 
Python 2 and 3, the one-shot compress() function only does a single call to 
zlib. This explains why Python 3 cannot take 4 GiB.

Python 2 uses an “int” for the input buffer size, hence the 2 GiB limit.

I tend to think of these cases as bugs, which could be fixed in 3.5 and 2.7. 
Sometimes others also treat adding 64-bit support as a bug fix, e.g. 
file.read() on Python 2 (Issue 21932). But other times it is handled as a new 
feature for the next Python version, e.g. os.read() was fixed in 3.5, but not 
2.7 (Issue 21932), random.getrandbits() proposed for 3.6 only (Issue 27072).

This kind of bug is apparently already fixed for crc32() and adler32() in 
Python 2 and 3; see Issue 10276.

This line from zlib.compress() also worries me:

zst.avail_out = length + length/1000 + 12 + 1; /* unsigned ints */

I suspect it may overflow, but I don’t have enough memory to verify. You would 
need to compress just under 4 GiB of data that requires 5 MB or more when 
compressed (i.e. not all the same bytes, or maybe try level=0).

Also, the logic for expanding the output buffer in each of zlib.decompress(), 
compressobj.compress(), decompressobj.decompress(), compressobj.flush(), and 
decompressobj.flush() looks faulty when it hits UINT_MAX. I suspect it may 
overwrite unallocated memory or do other funny stuff, but again I don’t have 
enough memory to verify. What happens when you decompress more than 4 GiB when 
the compressed input is less than 4 GiB?

Code fixes that I think could be made:

1. Avoid the output buffer size overflow in the zlib.compress() function

2. Rewrite zlib.compress() to call deflate() in a loop, one iteration for each 
4 GiB input or output chunk

3. Allow the zlib.decompress() function to expand the output buffer beyond 4 GiB

4. Rewrite zlib.decompress() to pass 4 GiB input chunks to inflate()

5. Allow the compressobj.compress() method to expand the output buffer beyond 4 
GiB

6. Rewrite compressobj.compress() to pass 4 GiB input chunks to deflate()

7. Allow the decompressobj.decompress() method to expand the output buffer 
beyond 4 GiB

8. Rewrite decompressobj.decompress() to pass 4 GiB input chunks to inflate(), 
and to save 4 GiB in decompressobj.unconsumed_tail and unused_data

9. Change the two flush() methods to abort if they allocate UINT_MAX bytes, 
rather than pointing into unallocated memory (I don’t think this could happen 
in real usage, but the code shares the same problem as above.)

----------
components: +Extension Modules -Library (Lib)
stage:  -> needs patch
versions: +Python 3.6 -Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27130>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

Reply via email to