On 6/3/22 14:47, Piotr P. Stefaniak wrote:

And in principle it would have been the desired behavior if I got to say
what it means for a file to change. In my particular case, it's safe and
preferable to ignore situations where ctime changes (permission update,
hardlink count change, etc.) but mtime does not. But I do want to learn
when mtime changes because that's a bug in the database and I ought to
report it.

On further thought, I think the flexibility you're suggesting isn't needed because pretty much everybody is in your situation, in that users don't care about ctime and so don't mind if it changes. Users care whether changes to the file mean that the tarball doesn't correspond to any state that the file ever had. For this, ctime is a false alarm for the reasons you mentioned (adding a hard link).

With that in mind I installed the attached patch into GNU tar's master branch on Savannah. I think this addresses the issue you mentioned, because the patched GNU tar won't warn at all in that situation. This also fixes some other issues that I noticed while in the neighborhood (see the NEWS file and the commit message).

Comments welcome.
From c1027eb5aee7b2a942c601ef6cdb1b132da83225 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 9 Jun 2022 15:50:06 -0700
Subject: [PATCH] =?UTF-8?q?Warn=20=E2=80=9Cfile=20changed=20as=20we=20read?=
 =?UTF-8?q?=20it=E2=80=9D=20less=20often?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/create.c (dump_file0): Remove an fstatat call that is
unnecessary because the file wasn’t read so we can treat the first
fstatat as atomic.  Warn “file changed” when the file’s size,
mtime, user ID, group ID, or mode changes, instead of when the
file’s size or ctime changes.  Also, when such a change happens,
do not change exit status if --ignore-failed-read.  Finally, don’t
attempt to change atime back if it didn’t change.
---
 NEWS         | 17 +++++++++++++++++
 doc/tar.texi | 10 ++++++----
 src/create.c | 54 ++++++++++++++++++++++++++++++++++++----------------
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/NEWS b/NEWS
index d9de3aa9..6700c04f 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,23 @@ version 1.34.90 (git)
 
 * Bug fixes
 
+** Warn "file changed as we read it" less often.
+   Formerly, tar warned if the file's size or ctime changed.
+   However, this generated a false positive if tar read a file
+   while another process hard-linked to it, changing its ctime.
+   Now, tar warns if the file's size, mtime, user ID, group ID,
+   or mode changes.  Although neither heuristic is perfect,
+   the new one should work better in practice.
+
+** Fix --ignore-failed-read to ignore file-changed read errors
+   as far as exit status is concerned.  You can now suppress file-changed
+   issues entirely with --ignore-failed-read --warning=no-file-changed.
+
+** Fix --remove-files to not remove a file that changed while we read it.
+
+** Fix --atime-preserve=replace to not fail if there was no need to replace,
+   either because we did not read the file, or the atime did not change.
+
 ** Fix handling of prefix keywords not followed by "." in pax headers.
 
 ** Fix handling of out-of-range sparse entries in pax headers.
diff --git a/doc/tar.texi b/doc/tar.texi
index 8cb150bf..95de6328 100644
--- a/doc/tar.texi
+++ b/doc/tar.texi
@@ -2859,7 +2859,7 @@ Ignore exit codes of subprocesses. @xref{Writing to an External Program}.
 @opsummary{ignore-failed-read}
 @item --ignore-failed-read
 
-Do not exit unsuccessfully merely because an unreadable file was encountered.
+Do not exit unsuccessfully merely because reading failed.
 @xref{Ignore Failed Read}.
 
 @opsummary{ignore-zeros}
@@ -4641,7 +4641,8 @@ Disable all warning messages.
 @item file-changed
 @samp{%s: file changed as we read it}
 @item failed-read
-Suppresses warnings about unreadable files or directories. This
+Suppresses warnings about read failures, which can occur if files
+or directories are unreadable, or if they change while being read.  This
 keyword applies only if used together with the @option{--ignore-failed-read}
 option. @xref{Ignore Failed Read}.
 @end table
@@ -5764,11 +5765,12 @@ Disable SELinux context support.
 @table @option
 @item --ignore-failed-read
 @opindex ignore-failed-read
-Do not exit with nonzero on unreadable files or directories.
+Do not exit with nonzero if there are mild problems while reading.
 @end table
 
 This option has effect only during creation.  It instructs tar to
-treat as mild conditions any missing or unreadable files (directories).
+treat as mild conditions any missing or unreadable files (directories),
+or files that change while reading.
 Such failures don't affect the program exit code, and the
 corresponding diagnostic messages are marked as warnings, not errors.
 These warnings can be suppressed using the
diff --git a/src/create.c b/src/create.c
index 30db2b5c..ebca3a93 100644
--- a/src/create.c
+++ b/src/create.c
@@ -1639,8 +1639,6 @@ dump_file0 (struct tar_stat_info *st, char const *name, char const *p)
 {
   union block *header;
   char type;
-  off_t original_size;
-  struct timespec original_ctime;
   off_t block_ordinal = -1;
   int fd = 0;
   bool is_dir;
@@ -1683,10 +1681,11 @@ dump_file0 (struct tar_stat_info *st, char const *name, char const *p)
       return;
     }
 
-  st->archive_file_size = original_size = st->stat.st_size;
+  struct stat st1 = st->stat;
+  st->archive_file_size = st->stat.st_size;
   st->atime = get_stat_atime (&st->stat);
   st->mtime = get_stat_mtime (&st->stat);
-  st->ctime = original_ctime = get_stat_ctime (&st->stat);
+  st->ctime = get_stat_ctime (&st->stat);
 
 #ifdef S_ISHIDDEN
   if (S_ISHIDDEN (st->stat.st_mode))
@@ -1736,7 +1735,7 @@ dump_file0 (struct tar_stat_info *st, char const *name, char const *p)
   if (is_dir || S_ISREG (st->stat.st_mode) || S_ISCTG (st->stat.st_mode))
     {
       bool ok;
-      struct stat final_stat;
+      struct stat st2;
 
       xattrs_acls_get (parentfd, name, st, 0, !is_dir);
       xattrs_selinux_get (parentfd, name, st, fd);
@@ -1804,31 +1803,54 @@ dump_file0 (struct tar_stat_info *st, char const *name, char const *p)
 		  errno = - parentfd;
 		  ok = false;
 		}
-	      else
-		ok = fstatat (parentfd, name, &final_stat, fstatat_flags) == 0;
 	    }
 	  else
-	    ok = fstat (fd, &final_stat) == 0;
+	    ok = fstat (fd, &st2) == 0;
 
 	  if (! ok)
 	    file_removed_diag (p, top_level, stat_diag);
 	}
 
-      if (ok)
+      if (ok && fd)
 	{
-	  if ((timespec_cmp (get_stat_ctime (&final_stat), original_ctime) != 0
-	       /* Original ctime will change if the file is a directory and
-		  --remove-files is given */
-	       && !(remove_files_option && is_dir))
-	      || original_size < final_stat.st_size)
+	  /* Heuristically check whether the file is the same in all
+	     attributes that tar cares about and can easily check.
+	     Although the check is not perfect since it does not
+	     consult file contents, it is typically good enough.
+	     Do not check atime which is saved only to replace it later.
+	     Do not check ctime where changes might be benign (e.g.,
+	     another process creates a hard link to the file).  */
+
+	  /* If the file's user ID, group ID or mode changed, tar may
+	     have output the wrong info for the file.  */
+	  ok &= st1.st_uid == st2.st_uid;
+	  ok &= st1.st_gid == st2.st_gid;
+	  ok &= st1.st_mode == st2.st_mode;
+
+	  /* Likewise for the file's mtime, but skip this check if it
+	     is a directory possibly updated by --remove-files.  */
+	  if (! (is_dir && remove_files_option))
+	    ok &= ! timespec_cmp (get_stat_mtime (&st1),
+				  get_stat_mtime (&st2));
+
+	  /* Likewise for the file's size, but skip this check if it
+	     is a directory as tar does not output directory sizes.
+	     Although dump_regular_file caught regular file shrinkage,
+	     it shouldn't hurt to check for shrinkage again now;
+	     plus, the file may have grown.  */
+	  if (!is_dir)
+	    ok &= st1.st_size == st2.st_size;
+
+	  if (!ok)
 	    {
 	      WARNOPT (WARN_FILE_CHANGED,
 		       (0, 0, _("%s: file changed as we read it"),
 			quotearg_colon (p)));
-	      set_exit_status (TAREXIT_DIFFERS);
+	      if (! ignore_failed_read_option)
+		set_exit_status (TAREXIT_DIFFERS);
 	    }
 	  else if (atime_preserve_option == replace_atime_preserve
-		   && fd && (is_dir || original_size != 0)
+		   && timespec_cmp (st->atime, get_stat_atime (&st2)) != 0
 		   && set_file_atime (fd, parentfd, name, st->atime) != 0)
 	    utime_error (p);
 	}
-- 
2.36.1

Reply via email to