The most often-reported bug for GNU gzip is that gzip -l reports sizes modulo 2**32, instead of full sizes. This is because the gzip format specifies a 4-byte (32-bit) size field.
A similar problem in gzip format is that it supports only nonzero 32-bit time stamps, which limits it to the range from 1970-01-01 00:00:01 through 2106-02-07 06:28:15 UTC. OK, so this is not as pressing a bug, but it wouldn't hurt to fix this while we're at it. I am thinking that we should fix that by putting full sizes and time stamps into the header, as follows: * If the file size is 2**32 or larger, gzip should emit an extra field that records the size divided by 2**32 (discarding fractions). gzip -l should read this field when reporting the size. * We want to do this in such a way that is compatible with all the other gzip implementations out there, including old versions of GNU gzip. So, we use the already-existing mechanism for extra fields, namely FLG.FEXTRA as per RFC 1952. We use SI1='H', SI2='S' (this is short for High-order bits of the Size). LEN is the length of the high-order bits field, and the field's value contains the high-order bits, represented as usual in little-endian order. A missing HS field is treated as zero. * Similarly, we use SI1='H', SI2='M' (High-order Modification time) for the high-order bits of the modification time, when a time stamp is less than 1 or greater than 2**32 - 1. There are a few extra goodies here, though. If the leading bit of the high-order time field is 1, then the entire time stamp (including the lower order bits) is treated as a negative number, using two's complement. Also, if the high-order bits are present but are all zero, the time stamp is considered to be zero rather than missing. * This approach will allow us to represent sizes up to 2**65568, which should be enough for quite some time. Similarly, representable times would range from 2**65567 seconds before 1970 to 2**65567 seconds after 1970, which would handle all file-system formats that I know of. * This approach is backward-compatible with older versions of gzip, with any decompressor that conforms to Internet RFC 1952, and with all implementations of gzip decompressors that I know of. * This approach does not address the issue of sub-second time stamp resolution, as I thought that would make the proposal too complicated. Comments are welcome; please CC: to <bug-gzip@gnu.org>.