> I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each 
> time I run tar -czf.

Hrm. Both Nix and Guix rely on the fact that tar.bz2, tar.gz, and tar.xz can be
made deterministic. However, this is starting to do down the reproducible
builds rabbit hole, which is deep.

Here is a good quick read on creating stable, deterministic archives:
https://reproducible-builds.org/docs/archives/

In particular, it suggests the following tar command might work for our
purposes:

    $ tar --sort=name \
          --mtime="1970-01-01 00:00Z" \
          --owner=0 --group=0 --numeric-owner \
          --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
          -cjf product.tar.bz2 path/to/source

where the date given to --mtime can be anything that seems appropriate. Another
option suggested is to post-process the non-determinism out of an archive.
Apparently, this can be done with zips as well:

https://packages.debian.org/sid/strip-nondeterminism

Without knowing more about your particular setup, I am afraid that's about all
I can suggest.

Norman Megill <[email protected]> wrote:
> On Saturday, February 1, 2020 at 11:49:53 PM UTC-5, heiphohmia wrote:
> >
> > > Could .gz or .bz2 be used instead?  I think those are stable.
> 
> 
> I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each 
> time I run tar -czf.  That is very surprising because I'm almost positive 
> that tar.gz was (empirically) stable a few years ago.  I wonder whether we 
> can depend on tar.bz2 being stable in the long run.
> 
> Norm
>  
> 
> >
> > Either of those would be great. Typically, bzip2 wins on compression and 
> > gzip 
> > on speed. In this case, bzip2 is probably a good choice. 
> >
> > In the off chance it's helpful, here are the standard command line 
> > invocations 
> > on unix for creating bzip2 and gzip archives of some directory: 
> >
> >     $ tar -cjf new-bzip2-archive.tar.bz2 path/to/contents 
> >     $ tar -czf new-gzip-archive.tar.gz path/to/contents 
> >
> 
> >
> > The -c flag "creates" an archive, the -j and -z flags compress with bzip2 
> >  and gzip respectively. The -f flag specifies the archive path. 
> >
> > And just in case the use of tar seems mysterious, the reason we need it 
> > here is 
> > because bzip2 and gzip are simply compression formats, meaning they only 
> > work 
> > on single files. So we use tar to first "archive" a collection of paths 
> > into a 
> > single file and compress the result. This is a common enough operation 
> > that tar 
> > simply provides convenience flags that do the wrapping for us. 
> >

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/39MDSOC29FT7D.3EQUO3VD3MIHZ%40wilsonb.com.

Reply via email to