> In the meantime, I have stopped regenerating
> http://us.metamath.org/downloads/metamath-program.zip each site build.
Thank you. This is very much appreciated.
However, I feel bad that this ended up simply creating mork work for you. If
there is any way I can help automate things, please let me know. In a previous
life, I was a sysops engineer.
> For the record, here is what I observed: ...
This is, indeed, surprising. On my machine, these same steps produce
identically-hashing contents. The discrepency bothered me enough that I spent
some time sleuthing the root cause.
The quick 'n dirty is that gzip 1.6 produces non-deterministic builds by
default. I was able to reproduce the behaviour you demonstrate by using gzip
1.6. The non-deterministicity is fixed in gzip version 1.10 (dated from
2018-12-29).
The medium-length explanation is that gzip includes a timestamp in its header
by default. Up until version 1.9, when the input file is stdin this timestamp
becomes the current system time. Version 1.10 chooses to simply elide the
timestamp altogether in this case. However, when input is a normal file, all
versions, including 1.6, behave similarly by setting the timestamp to the
original file's modification time (mtime).
It turns out that you can manually tell gzip to elide the timestamp in the
latter case by providing the non-obviously named '-n' (--no-name) option. This
means, that with version 1.6, you should be able to get reproducible tar.gz
archives by manually chaining tar and gzip together:
$ tar -cf - path/to/metamath | gzip -n >metamath.tar.gz
or telling tar to chain for us with the -I switch:
$ tar -I 'gzip -n' -cf metamath.tar.gz path/to/metamath
As it turns out, either of those invocations produce archives that hash
identically between versions 1.6 and 1.10.
And just for over-the-top kicks, if anyone is interested in checking out the
commit that introduced the above change to gzip, here it is:
url: git://git.savannah.gnu.org/gzip.git
commit: bce795d0a38ae10f13b3297f1253acdeb4defc21
Cheers.
Norman Megill <[email protected]> wrote:
> On Sunday, February 2, 2020 at 8:50:35 PM UTC-5, heiphohmia wrote:
> >
> > > I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each
> > > time I run tar -czf.
> >
> > Hrm. Both Nix and Guix rely on the fact that tar.bz2, tar.gz, and tar.xz
> > can be
> > made deterministic. However, this is starting to do down the reproducible
> > builds rabbit hole, which is deep.
> >
>
> For the record, here is what I observed:
>
> $ ls -ld metamath [ <- the standard metamath subdirectory]
> drwxr-xr-x 2 norm adm 4096 Nov 1 23:30 metamath
> $ tar -czf a.tgz metamath
> $ mv a.tgz b.tgz
> $ tar -czf a.tgz metamath
> $ md5sum a.tgz b.tgz
> 48ab49f2e3309508809662d2e28327b8 a.tgz
> f14761124cfdf55426b8d7bbcf8cff40 b.tgz
> $ tar -cjf a.tbz2 metamath
> $ mv a.tbz2 b.tbz2
> $ tar -cjf a.tbz2 metamath
> $ md5sum a.tbz2 b.tbz2
> 1c6419ac596b2342635469bfcd6fcbd6 a.tbz2
> 1c6419ac596b2342635469bfcd6fcbd6 b.tbz2
>
> $ tar --version
> tar (GNU tar) 1.27.1 ....
> $ gzip --version
> gzip 1.6 ....
> $ bzip2 --version
> bzip2, a block-sorting file compressor. Version 1.0.6, 6-Sept-2010. ....
>
>
>
> > Here is a good quick read on creating stable, deterministic archives:
> > https://reproducible-builds.org/docs/archives/
> >
> > In particular, it suggests the following tar command might work for our
> > purposes:
> >
> > $ tar --sort=name \
> > --mtime="1970-01-01 00:00Z" \
> > --owner=0 --group=0 --numeric-owner \
> >
> > --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime
> > \
> > -cjf product.tar.bz2 path/to/source
> >
> > where the date given to --mtime can be anything that seems appropriate.
> > Another
> > option suggested is to post-process the non-determinism out of an archive.
> > Apparently, this can be done with zips as well:
> >
> > https://packages.debian.org/sid/strip-nondeterminism
> >
> > Without knowing more about your particular setup, I am afraid that's about
> > all
> > I can suggest.
> >
>
> Thanks for your suggestions. I will consider them when coming up with a
> permanent solution.
>
> In the meantime, I have stopped regenerating
> http://us.metamath.org/downloads/metamath-program.zip each site build.
> Until I implement an automated solution, I will create it by hand whenever
> I update the program. So you should be able to depend on it being stable
> from now on.
>
> BTW I took out the Windows executable metamath.exe from
> metamath-program.zip since it's not relevant to Linux.
>
> Norm
--
You received this message because you are subscribed to the Google Groups
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/metamath/2NBHL257JJAV8.2UDL1A7NLXCHB%40wilsonb.com.