Larry Bates <[EMAIL PROTECTED]> bristled: > Are you serious? A zipfile with a comment > 4Kbytes. I've never encountered > such a beast.
If I hadn't run into one I would never have had a clue that Python's zipfile module had this silly bug. > As with any open source product it is much better to roll up your sleeves > and pitch in to fix a problem than to rail about "how it is stupidly > broken". You are welcome to submit a patch or at the very least a good > description of the problem and possible solutions. If you have gotten a > lot of value out of Python, you might consider this "giving back". You > haven't paid anything for the value it has provided. Ah yes, the old "well, if you found it you should fix it" meme - another reason I found it pretty easy to stop reading this group. It's as stupid a position as it ever was (and FWIW I don't believe I've ever seen any of the real Python developers mouth this crap). Now, I have learned somewhat more than I knew (or ever wanted to know) about zipfiles since I smacked headfirst into this bug, and I've changed the subject line to reflect my current understanding. :-/ Back then it had already occurred to me that *just* changing the size of the step back seemed an incomplete fix: after all, that leaves you scanning through random binary glop looking for the signature. With the signature being four bytes, okay, it will *nearly* always work (just as the exisiting 4K scan does), but... well, from what I've read in the format specs that's about as good as it gets. The alternative, some sort of backwards scan, would avoid the binary glop but has much the same problem, in principle, with finding the signature embedded in the archive comment. Even worse, arguably, since that comment is apparently entirely up to the archive creator, so if there's a way to use a fake central directory for nefarious purposes, that would make it trivial to do. Which is the point where I decided that the file format itself is broken... (oh, and then I came across something from the info-zip crew that said much the same thing, though they didn't mention this particular design, uhm, shortcoming.) So I guess that perhaps the stupidly obvious fix: - END_BLOCK = min(filesize, 1024 * 4) + END_BLOCK = min(filesize, 1024 * 64 + 22) is after all about the best that can be done. (the lack of the size-of-End-Of-Central-Directory-record in the existing code isn't a separate bug, but if we're going to pretend we accomodate all valid zipfiles it wouldn't do to overlook it) So now you may imagine that your rudeness has had the result you intended after all, and I guess it has, though at a cost - well, you probably never cared what I thought about you anyway. BTW, thanks for the pointer someone else gave to the proper place for posting bugs. I'd had the silly idea that I would be able to find that easily at www.python.org, but if I had then I'd not have posted here and had so much fun. -- The most effective way to get information from usenet is not to ask a question; it is to post incorrect information. -- Aahz's Law Apparently denigrating the bug reporter can sometimes result in a patch, too, but I don't think that's in the same spirit. -- http://mail.python.org/mailman/listinfo/python-list