Paul Eggert wrote: >> http://gregroelofs.com/test/testCompressThenConcat.txt.gz
> Thanks, I've verified that the new code works with that example. > It's a bit much to turn that into a test case. Perhaps if I > find time I'll write a smaller one. I just attached to https://issues.apache.org/jira/browse/MAPREDUCE-1927 a small zipfile containing a script and some binary bits to generate all 32 non-encrypted gzip header variants. You can either grab that or find the results here: http://gregroelofs.com/test/all-gz-header-types-20100812.zip The script is dependent on this CRC checker: http://gregroelofs.com/code/check-latest.tgz (There's probably some standard util available these days, but I didn't look.) Oh, and the gzip file above (3409 bytes) was slightly broken; I had overlooked the part about extra-field subfields, so it was missing the two ID bytes and two subfield-length bytes. I just replaced it with a 3413-byte corrected version. (gzip doesn't look inside it anyway, but if you want a correct test case, here you go. All 16 of the new ones share the same fix, btw.) I've also tested your 20100703 header-CRC patch (backported to 1.4), and it works great--thanks again. Related fixes/suggestions: - the "header16 != crc16" error message in gzip.c would look nicer with a pair of "%04x" instead of "%x". (A CPAN Perl module has a zero-the- upper-byte bug, and the mismatched one-byte vs. two-byte error messages looked weird.) - algorithm.doc needs updating: - "bit 1 set: continuation of multi-part gzip file" -> "bit 1 set: header CRC-16 present" - new "? bytes optional 16-bit header CRC" line immediately after "? bytes optional file comment, zero terminated" - should mention spec at http://www.ietf.org/rfc/rfc1952.txt Also, are you aware of _any_ gzip utilities that support encryption? The full, drop-in crypt.c and crypt.h have been available for a decade, but it appears that the gzip code is explicitly not wired to support it. Does anything out there do so? If not, you might want to add another note to algorithm.doc mentioning that crypto's not official in either the code sense or the spec sense. (Or did an older version of gzip support it in gzip files?) Thanks, Greg