It seems like gzip with the -l and --verbose options is sometimes returning the wrong totals for compressed and uncompressed size.
My data set is 50G worth of files, each around 25-100M, and I was not immediately able to reproduce the problem on a smaller set. For the dataset in question, gzip -l --verbose returns totals that equal about (but not exactly) 1/2 of the real totals for compressed and uncompressed sizes. It appears that the individual numbers for each file are correct, but only the totals are wrong. Here's my version info: gzip 1.3.5 (2002-09-30) Copyright 2002 Free Software Foundation Copyright 1992-1993 Jean-loup Gailly This program comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of this program under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. Compilation options: DIRENT UTIME STDC_HEADERS HAVE_UNISTD_H HAVE_MEMORY_H HAVE_STRING_H HAVE_LSTAT Written by Jean-loup Gailly. Linux version 2.6.18-194.8.1.el5 (mockbu...@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Thu Jul 1 19:04:48 EDT 2010 This is a centos x86 64-bit system. thanks, Curtis