Hello guys. I have a large amount of confidential gzip compressed binary data. These files _all_ have a very specific property of giving a different output whether or not I use the "-f" flag of zcat (or gzip -d -c): one additional line appears when I use the -f flag.
- I don't have the uncompressed versions of these files, nor the actual tool used to compress them - I am trying to create a reproducible example but have not yet succeeded Here is what it looks like, with null bytes replaced by dots for readability: (sorry for gmail's automatic line wrap, there are of course only two lines per output) $ file p.gz: gzip compressed data, was "20181218.TXT", last modified: Wed Dec 19 08:59:07 2018, from NTFS filesystem (NT) $ wc -c p.gz 9099264 p.gz $ zcat p.gz | wc -c 48085600 $ zcat -f p.gz | wc -c 48085955 $ gzip -d -c p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g' 20010101AAAAAAAA 010120010101Q AA....00A0000000AA0AA0AAA 0A AA 0101 0012001010101:01T2001012001:0101:01AAAAAAD/S\r$ T000378625..................... ...............................................\r$ $ gzip -d -c -f p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g' T000378625..................... ...............................................\r$ ...................................................................................................................................................................................................................................................................................................................................................................$ That additional line containing only null bytes is not supposed to appear, is that some kind of padding that was not handled correctly by gzip? If this is not yet an identified bug, here are my questions: Do you know what could be happening? Do you know how I could try to reproduce the problem on non-confidential data for you to be able to debug? (I already tried re-compressing both versions of the decompressed files with this binary from 2007: http://gnuwin32.sourceforge.net/packages/gzip.htm but the problem does not happen) I can contact the guys who created the files and ask them anything, but I'd like to be sure of what to ask them because contacting them repeatedly would be considered very rude. What should I ask them? Thank you very much in advance for any reply which could make me understand what is happening :) Minato PS: I am not subscribed to the mailing list yet