Mark Adler wrote: > I got a report of a behavior of gzip that is not replicated in pigz. In the > process of investigating that, I found a bug in gzip (all versions including > 1.4). Here's the deal. > > The behavior is that if you use --force and --stdout with --decompress, gzip > will behave like cat if it doesn't recognize any compressed data magic > headers. This is so that zcat can act as a replacement for cat, > automatically detecting and decompressing compressed data. (pigz doesn't > currently do that, which I need to fix.) Another behavior of gzip is that it > will decompress concatenated gzip streams. Combining those two behaviors, > gzip -cfd on a gzip stream followed by non-gzip data should give you the > decompressed data from the stream followed by the non-gzip data copied. > > gzip doesn't do that, at least not correctly. > > What it does for a small example is write the decompressed data, write the > initial gzip stream without decompressing it (!), and then write the non-gzip > data. The stuff in the middle is the result of this code in gzip.c: > > } else if (force && to_stdout && !list) { /* pass input unchanged */ > method = STORED; > work = copy; > inptr = 0; > last_member = 1; > } > > (By the way, the tabs should be removed from all of the gzip source code.) > > The culprit is the "inptr = 0". It resets the input back to the beginning of > the current input buffer (wherever that happens to be) and copies from there. > That works fine if you start the input with non-gzip data, but messes up in > the case of non-gzip data after a gzip stream. > > I have not developed a fix, since it is non-trivial. You can't just restore > a saved inptr, since it is possible for the two-byte magic header to be split > on a buffer boundary. That is, reading the first byte of the magic header > empties the input buffer, so that reading the second byte of the magic reader > fills the input buffer, overwriting the first byte. > > If you want, I can try to come up with a patch for that, or you could have > that pleasure.
Thanks for the report. I'm adding a test to exercise that, currently expected to fail: >From 026eb1815d339e73102e3ae5a61543049ae9423a Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Tue, 2 Feb 2010 08:19:36 +0100 Subject: [PATCH 1/2] gzip -cdf mishandles some concatenated input streams: test it * tests/mixed: Exercise "gzip -cdf" bug. * Makefile.am (XFAIL_TESTS): Add it. Mark Adler reported the bug. --- Makefile.am | 3 +++ tests/mixed | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+), 0 deletions(-) create mode 100644 tests/mixed diff --git a/Makefile.am b/Makefile.am index b4e75fc..4263b1d 100644 --- a/Makefile.am +++ b/Makefile.am @@ -99,6 +99,9 @@ check-local: $(FILES_TO_CHECK) $(bin_PROGRAMS) gzip.doc.gz done @echo 'Test succeeded.' +XFAIL_TESTS = \ + tests/mixed + TESTS = \ tests/helin-segv \ tests/hufts \ diff --git a/tests/mixed b/tests/mixed new file mode 100644 index 0000000..0ca8e80 --- /dev/null +++ b/tests/mixed @@ -0,0 +1,52 @@ +#!/bin/sh +# Ensure that gzip -cdf handles mixed compressed/not-compressed data +# Before gzip-1.5, it would produce invalid output. + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# limit so don't run it by default. + +if test "$VERBOSE" = yes; then + set -x + zgrep --version +fi + +: ${srcdir=.} +. "$srcdir/tests/init.sh" + +printf 'xxx\nyyy\n' > exp2 || framework_failure +printf 'aaa\nbbb\nccc\n' > exp3 || framework_failure + +fail=0 + +(echo xxx; echo yyy) > in || fail=1 +gzip -cdf < in > out || fail=1 +compare out exp2 || fail=1 + +# Uncompressed input, followed by compressed data. +(echo xxx; echo yyy|gzip) > in || fail=1 +gzip -cdf < in > out || fail=1 +compare out exp2 || fail=1 + +# Compressed input, followed by regular (not-compressed) data. +(echo xxx|gzip; echo yyy) > in || fail=1 +gzip -cdf < in > out || fail=1 +compare out exp2 || fail=1 + +(echo xxx|gzip; echo yyy|gzip) > in || fail=1 +gzip -cdf < in > out || fail=1 +compare out exp2 || fail=1 + +Exit $fail -- 1.7.0.rc1.167.gdb08