Hello, Paul: I hope you don't mind being involved in this. I would appreciate any input that you are able to provide. Context is Debian bug #897653 (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897653), but all the important bits are already quoted below.
the general idea of pristine-tar is the following: as a distribution maintainer using git, I want to be able to recreate the tarball released by the upstream maintainers without duplicating the bits that I already have on git (i.e. the actual source code). so when I import a new release tarball, pristine-tar extracts it on my git repository, then stores a (hopefully) small delta file that can be used later to reconstruct this tarball I just imported from the contents of the git repository. This is no longer working for some (but not all) tarballs, because the archives created by tar 1.30 and tar 1.29 are different for the same data (see below for one example). This is not a problem for newly imported tarballs, but it is a problem for tarballs that were imported with tar < 1.30. On Thu, May 03, 2018 at 05:14:54PM -0300, Antonio Terceiro wrote: > Package: tar > Version: 1.30+dfsg-1 > Severity: grave > Justification: causes non-serious data loss > > Hi, > > After tar 1.30 arrived in unstabled, pristine-tar can no longer > reproduce tarballs that it previously could. For example I just hit this > with bundler: > > $ debcheckout -a bundler > declared git repository at g...@salsa.debian.org:ruby-team/bundler.git > git clone https://salsa.debian.org/ruby-team/bundler.git bundler ... > Cloning into 'bundler'... > remote: Counting objects: 3981, done. > remote: Compressing objects: 100% (1289/1289), done. > remote: Total 3981 (delta 2570), reused 3950 (delta 2551) > Receiving objects: 100% (3981/3981), 3.55 MiB | 24.00 KiB/s, done. > Resolving deltas: 100% (2570/2570), done. > git remote set-url --push origin g...@salsa.debian.org:ruby-team/bundler.git > ... > $ cd bundler/ > $ pristine-tar checkout /tmp/bundler_1.16.1.orig.tar.gz > xdelta3: target window checksum mismatch: XD3_INVALID_INPUT > xdelta3: normally this indicates that the source file is incorrect > xdelta3: please verify the source file with sha1sum or equivalent > xdelta3: target window checksum mismatch: XD3_INVALID_INPUT > xdelta3: normally this indicates that the source file is incorrect > xdelta3: please verify the source file with sha1sum or equivalent > xdelta3: target window checksum mismatch: XD3_INVALID_INPUT > xdelta3: normally this indicates that the source file is incorrect > xdelta3: please verify the source file with sha1sum or equivalent > xdelta3: target window checksum mismatch: XD3_INVALID_INPUT > xdelta3: normally this indicates that the source file is incorrect > xdelta3: please verify the source file with sha1sum or equivalent > pristine-tar: Failed to reproduce original tarball. Please file a bug report. > pristine-tar: failed to generate tarball > > See also #897249 and #897421 for similar pristine-tar user reports. > Downgrading tar to 1.29b-2 makes it work again. The messages above are from xdelta3, and they mean that we are trying to apply a delta to a corrupted version of the original file from where that delta was taken. Indeed, by dumping the tarballs created by tar 1.29 and 1.30, I was able to find the difference: $ diffoscope ./old/0/recreatetarball ./new/0/recreatetarball |##########################################################################################################################| 100% Time: 0:00:00 --- ./old/0/recreatetarball +++ ./new/0/recreatetarball │┄ No file format specific differences found inside, yet data differs (POSIX tar archive (GNU)) @@ -58403,24 +58403,24 @@ 000e4220: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4230: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4240: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4250: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4260: 0000 0000 3030 3030 3030 3000 3030 3030 ....0000000.0000 000e4270: 3030 3000 3030 3030 3030 3000 3030 3030 000.0000000.0000 000e4280: 3030 3030 3134 3600 3030 3030 3030 3030 0000146.00000000 -000e4290: 3030 3000 3031 3135 3636 0020 4c00 0000 000.011566. L... +000e4290: 3030 3000 3030 3737 3536 0020 4c00 0000 000.007756. L... 000e42a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e42b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e42c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e42d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e42e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e42f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -000e4300: 0075 7374 6172 2020 0072 6f6f 7400 0000 .ustar .root... +000e4300: 0075 7374 6172 2020 0000 0000 0000 0000 .ustar ........ 000e4310: 0000 0000 0000 0000 0000 0000 0000 0000 ................ -000e4320: 0000 0000 0000 0000 0072 6f6f 7400 0000 .........root... +000e4320: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4330: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4340: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4350: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4360: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4370: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4380: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000e4390: 0000 0000 0000 0000 0000 0000 0000 0000 ................ I cloned the tar git repository and ran a bisection. The commit that broke things is the following: commit da8d0659a6fe8faf76b3a3275cf1f403e78edb1f Author: Paul Eggert <egg...@cs.ucla.edu> Date: Thu Apr 6 18:16:51 2017 -0700 --numeric-owner now affects private headers too Problem reported by Daniel Peebles in: http://lists.gnu.org/archive/html/bug-tar/2017-04/msg00004.html * NEWS: Document this. * src/create.c (write_gnu_long_link): If --numeric-owner, leave the user and group empty in a private header. Cache the names for 0. Reverting the changes in this commit "fixes" it, but of course, given this is a patch that is supposed to *improve* reproducibility, just reverting it is probably not what we want. I still need to study the code a bit further to try to come up with a better suggestion.
signature.asc
Description: PGP signature