Bugs item #1074261, was opened at 2004-11-27 12:29 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074261&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Mark Eichin (eichin) Assigned to: A.M. Kuchling (akuchling) Summary: gzip dies on gz files with many appended headers Initial Comment: One of the values of the gzip format is that one can reopen for append and the file is, as a whole, still valid. This is accomplished by adding new headers on reopen. gzip.py (as tested on 2.1, 2.3, and 2.4rc1 freshly built) doesn't deal well with more than a certain number of appended headers. The included test case generates (using gzip.py) such a file, runs gzip -tv on it to show that it is valid, and then tries to read it with gzip.py -- and it blows out, with OverflowError: long int too large to convert to int in earlier releases, MemoryError in 2.4rc1 - what's going on is that gzip.GzipFile.read keeps doubling readsize and calling _read again; _read does call _read_gzip_header, and consumes *one* header. So, readsize doubling means that older pythons blow out by not autopromoting past 2**32, and 2.4 blows out trying to call file.read on a huge value - but basically, more than 30 or so headers and it fails. The test case below is based on a real-world queueing case that generates over 200 appended headers - and isn't bounded in any useful way. I'll think about ways to make GzipFile more clever, but I don't have a patch yet. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2005-06-09 10:23 Message: Logged In: YES user_id=11375 Patch applied to both HEAD and 2.4-maint branches; thanks! ---------------------------------------------------------------------- Comment By: Mark Eichin (eichin) Date: 2004-11-27 18:28 Message: Logged In: YES user_id=79734 Patch sent to patch-tracker as 1074381. ---------------------------------------------------------------------- Comment By: Mark Eichin (eichin) Date: 2004-11-27 12:48 Message: Logged In: YES user_id=79734 Oh, this is actually easy to fix: just clamp readsize. After all, you don't *actually* want to try to read gigabyte chunks most of the time. (The supplied patch allows one to override gzip.GzipFile.max_read_chunk if one really does.) Tested on 2.4rc1, and a version backported to 2.1 works there too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074261&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com