On 12/13/21 03:30, Sergey Poznyakoff wrote:
Reproducible tarballs in PAX format are
easily made with the following option:

   --pax-option=exthdr.name=%d/PaxHeaders/%f,atime:=0,ctime:=0

I tried this, and unfortunately it introduced unnecessary extended pax records into the output. For example:

opts='--pax-option=exthdr.name=%d/PaxHeaders/%f,atime:=0,ctime:=0'
touch -d 2021-12-13T00:00:00Z foo
tar -cf t1 -H ustar foo
tar -cf t2 -H pax $opts foo

Although t1 and t2 should be identical, t2 contains an unnecessary pax extended header saying "ctime=0" and "atime=0".

I installed the attached patch to suggest --pax option 'delete=[ac]time' instead; this should work around the issue.

Perhaps 'tar' should be changed to not output unnecessary pax extended headers in cases like these; this would take some hacking, though, and I'm not sure it's worth it.

For tzdb I've suppressed mtime records in extended pax headers by touching all the files to integer-second-resolution timestamps as shown in the above example. I do this because I want the tarball timestamps to exactly match what's in Git.

Perhaps there should be a more convenient way for tar to generate pax-format tarballs without atime and ctime (and perhaps without extended mtime records too), as for software distro tarballs there's often little value and perhaps even some hassle with those subsecond-resolution timestamps.

Or simpler advice might be simply "use ustar format for software distros"...
From 205ed228925b1d2c1052821b604703b1b7931089 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Mon, 13 Dec 2021 12:42:11 -0800
Subject: [PATCH] More reproducible tarball doc

* doc/tar.texi (PAX keywords): Improve discussion of how
to generate reproducible tarballs.
---
 doc/tar.texi | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/doc/tar.texi b/doc/tar.texi
index 389a3448..64efcebd 100644
--- a/doc/tar.texi
+++ b/doc/tar.texi
@@ -10423,6 +10423,9 @@ the following forms:
 When used with one of archive-creation commands,
 this option instructs @command{tar} to omit from extended header records
 that it produces any keywords matching the string @var{pattern}.
+If the pattern contains shell metacharacters like @samp{*}, it should
+be quoted to prevent the shell from expanding the pattern before
+@command{tar} sees it.
 
 When used in extract or list mode, this option instructs tar
 to ignore any keywords matching the given @var{pattern} in the extended
@@ -10431,7 +10434,7 @@ matching notation described in @acronym{POSIX 1003.2}, 3.13
 (@pxref{wildcards}).  For example:
 
 @smallexample
---pax-option delete=security.*
+--pax-option 'delete=security.*'
 @end smallexample
 
 would suppress security-related information.
@@ -10560,11 +10563,9 @@ For example, to set all modification times to the current date, you
 use the following option:
 
 @smallexample
---pax-option='mtime:=@{now@}'
+--pax-option 'mtime:=@{now@}'
 @end smallexample
 
-Note quoting of the option's argument.
-
 @cindex archives, binary equivalent
 @cindex binary equivalent archives, creating
 As another example, here is the option that ensures that any two
@@ -10572,7 +10573,7 @@ archives created using it, will be binary equivalent if they have the
 same contents:
 
 @smallexample
---pax-option=atime:=0
+--pax-option delete=atime
 @end smallexample
 
 @noindent
@@ -10581,27 +10582,33 @@ from them, you will also need to eliminate changes due to ctime, as
 shown in examples below:
 
 @smallexample
---pax-option=atime:=0,ctime:=0
+--pax-option 'delete=[ac]time'
 @end smallexample
 
 @noindent
-or
+Normally @command{tar} saves an mtime value with subsecond resolution
+in an extended header for any file with a timestamp that is not on a
+one-second boundary.  This is in addition to the traditional mtime
+timestamp in the header block, which can represent integer timestamps
+in the 1970-01-01 00:00:00 through 2242-03-16 12:56:31 @sc{utc}.  If
+this traditional timestamp suffices and you do not want subsecond
+timestamp resolution, you can use:
 
 @smallexample
---pax-option=atime:=0,delete=ctime
+--pax-option 'delete=[acm]time'
 @end smallexample
 
-Notice, that if you create an archive in POSIX format (@pxref{posix})
-and the environment variable @env{POSIXLY_CORRECT} is set, then the
-two archives created using the same options on the same set of files
-will not be byte-to-byte equivalent even with the above option.  This
-is because the posix default for extended header names includes the
-PID of the tar process, which is different at each run. To produce
-byte-to-byte equivalent archives in this case, either unset
-@env{POSIXLY_CORRECT}, or use the following option:
+If the environment variable @env{POSIXLY_CORRECT} is set, two POSIX
+archives created using the same options on the same set of files might
+not be byte-to-byte equivalent even with the above options.  This is
+because the POSIX default for extended header names includes
+@command{tar}'s process @acronym{ID}, which typically differs at each
+run.  To produce byte-to-byte equivalent archives in this case, either
+unset @env{POSIXLY_CORRECT}, or use the following option, which can be
+combined with the above options:
 
 @smallexample
----pax-option=exthdr.name=%d/PaxHeaders/%f,atime:=0,ctime:=0
+--pax-option exthdr.name=%d/PaxHeaders/%f
 @end smallexample
 
 @node Checksumming
-- 
2.32.0

Reply via email to