Hi, it appears that `unpack` support is broken in gzip 1.8. Here are steps to reproduce:
$ printf '\x1f\x1e\x00\x00\x00\x06\x03\x01\x01\x00\x61\x6e\x62\x16\xc8' > test.z $ sha1sum test.z 077b34196a50fa1c93f58892de828c2d4a558e40 test.z Here's what `gzip` currently says: $ gzip -cd test.z b gzip: test.z: invalid compressed data--length error Here's what I expected to see: $ gzip -cd test.z banana This was broken by git commit 16977ae7, whose relevant parts are: - if (peek == eob && len == max_len) break; /* end of file? */ + if (peek == eob) + break; /* End of file. */ + if (eob < peek) + gzip_error ("invalid compressed data--code out of range"); Notice that the `eob` check no longer verifies that the current symbol length is equal to the max length when checking for eob. This is a problem because it ignores leading zeroes in the eob code. In the example pack file, the Huffman codes are: a = 1 n = 01 b = 000 eob = 001 Since the code for `a` and `eob` are both integer 1, `peek == eob` and decoding terminates prematurely. One possible fix is below. It adds back the `len == max_len` check and verifies that the Huffman tree is complete. A complete Huffman tree cannot contain unused prefixes, so this eliminates the code out of bounds check. Regards, Vidar Holen diff --git a/unpack.c b/unpack.c index c1a3684..8ebbc19 100644 --- a/unpack.c +++ b/unpack.c @@ -173,7 +173,7 @@ local void build_tree() int len; /* current bit length */ uch *prefixp; /* pointer in prefix_len */ - for (len = max_len; len >= 1; len--) { + for (len = max_len; len >= 0; len--) { /* The number of parent nodes at this level is half the total * number of nodes at parent level: */ @@ -186,6 +186,10 @@ local void build_tree() /* Restore nodes to be parents+leaves: */ nodes += leaves[len]; } + + if (parents[0] != 1) { + gzip_error ("too few leaves in Huffman tree"); + } /* Construct the prefix table, from shortest leaves to longest ones. * The shortest code is all ones, so we start at the end of the table. */ @@ -250,10 +254,8 @@ int unpack(in, out) } } /* At this point, peek is the next complete code, of len bits */ - if (peek == eob) + if (peek == eob && len == max_len) break; /* End of file. */ - if (eob < peek) - gzip_error ("invalid compressed data--code out of range"); put_ubyte(literal[peek+lit_base[len]]); Tracev((stderr,"%02d %04x %c\n", len, peek, literal[peek+lit_base[len]]));