On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby <pry...@telsasoft.com> wrote:
> > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so
> > didn't warn about the file itself, but then failed one moment later in
> > rmdir.
>
> Yeah, I think this is my fault.  In commit f357233c the new lstat()
> call might return ENOENT for STATUS_DELETE_PENDING, and then we don't
> enter pgunlink()'s 10 second sleep-retry loop.  Let me think about how
> best to fix that, and how to write a regression test program that
> would exercise stuff like this.  Might take a couple of days as I am
> away from computers until mid-week.

I think something like the attached should do the right thing for
STATUS_DELETE_PENDING (sort of "ENOENT-in-progress").  unlink() goes
back to being blocking (sleep+retry until eventually we reach ENOENT
or we time out and give up with EACCES), but we still distinguish it
from true ENOENT so we have a fast exit in that case.  This is passing
CI, but not tested yet.

One ugly thing in this patch is that it has to deal with our
historical mistake (?) of including Windows headers in this file in
Cygwin builds for no reason and thus getting WIN32 defined on a
non-WIN32 build, as I've complained about before[1] but not yet tidied
up.

Remembering why we do any of this weird looking stuff that we don't
need on Unix, the general idea is that things that scan directories to
unlink everything before unlinking the parent directory need to block
for a while on STATUS_DELETE_PENDING to increase their probability of
success, while things that do anything else probably just want to skip
such zombie files completely.  To recap, we have:

 * readdir() sees files that are ENOENT-in-progress (so recursive
unlinks can see them)
 * unlink() waits for ENOENT-in-progress to reach ENOENT (what broke here)
 * stat() and lstat() report ENOENT-in-progress as ENOENT (done to fix
eg pg_basebackup, which used to fail at random on Windows)
 * open() reports ENOENT-in-progress as either ENOENT or EEXIST
depending on O_CREAT (because used by stat())

Clearly this set of kludges isn't perfect and other kludge-sets would
be possible too.  One thought is that we could hide ENOENT-in-progress
from readdir(), and add a new rmdir() wrapper instead.  If it gets a
directory-not-empty error from the kernel, it could at that point wait
for zombie files to go away (perhaps registering for file system
events with some local equivalent of KQ_FILTER_VNODE if there is one,
to be less sloppy that the current sleep() nonsense, but sleep would
work too).

When I'm back at my real keyboard I'll try to come up with tests for
this stuff, but I'm not sure how solid we can really make a test for
this particular case -- I think you'd need to have another thread open
the file and then close it after different periods of time, to
demonstrate that the retry loop works but also gives up, and that's
exactly the sort of timing-dependent stuff we try to avoid.  But I
think I'll try that anyway, because it's essential infrastructure to
allow Unix-only hackers to work only this stuff.  Once we have that,
we might be able to make some more progress with the various
FILE_DISPOSITION_POSIX_SEMANTICS proposals, if it helps, because we'll
have reproducible evidence for what it really does.

[1] https://commitfest.postgresql.org/39/3781/
From 0fa290b4d3597c843804d3ee35559074864424aa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 3 Oct 2022 09:15:02 +1300
Subject: [PATCH] Fix STATUS_DELETE_PENDING handling in pgunlink().

Commit c5cb8f3b didn't handle STATUS_DELETE_PENDING correctly, and would
report ENOENT immediately without waiting.  In order for simple recursive
unlink algorithms to have a chance of succeeding while other backends
might not have closed them yet, we rely on the 10-seconds-of-retries
behavior present in pgunlink().

Diagnosed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220920013122.GA31833%40telsasoft.com
---
 src/port/dirmod.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index ae6301dd6c..aae81228ad 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -39,6 +39,10 @@
 #endif
 #endif
 
+#if defined(WIN32) && !defined(__CYGWIN__)
+#include "port/win32ntdll.h"
+#endif
+
 #if defined(WIN32) || defined(__CYGWIN__)
 
 /*
@@ -91,6 +95,22 @@ pgrename(const char *from, const char *to)
 	return 0;
 }
 
+/*
+ * Check if _pglstat64()'s reason for failure was STATUS_DELETE_PENDING.
+ * This doesn't apply to Cygwin, which has its own lstat() that would report
+ * the case as EACCES.
+*/
+static bool
+lstat_error_was_status_delete_pending(void)
+{
+	if (errno != ENOENT)
+		return false;
+#if defined(WIN32) && !defined(__CYGWIN__)
+	if (pg_RtlGetLastNtStatus() == STATUS_DELETE_PENDING)
+		return true;
+#endif
+	return false;
+}
 
 /*
  *	pgunlink
@@ -98,6 +118,7 @@ pgrename(const char *from, const char *to)
 int
 pgunlink(const char *path)
 {
+	bool		is_lnk;
 	int			loops = 0;
 	struct stat st;
 
@@ -122,9 +143,22 @@ pgunlink(const char *path)
 	 * due to sharing violations, but that seems unlikely.  We could perhaps
 	 * prevent that by holding a file handle ourselves across the lstat() and
 	 * the retry loop, but that seems like over-engineering for now.
+	 *
+	 * In the special case of a STATUS_DELETE_PENDING error (file already
+	 * unlinked, but someone still has it open), we don't want to report ENOENT
+	 * to the caller immediately, because rmdir(parent) would probably fail.
+	 * We want to wait until the file truly goes away so that simple recursive
+	 * directory unlink algorithms work.
 	 */
 	if (lstat(path, &st) < 0)
-		return -1;
+	{
+		if (lstat_error_was_status_delete_pending())
+			is_lnk = false;
+		else
+			return -1;
+	}
+	else
+		is_lnk = S_ISLNK(st.st_mode);
 
 	/*
 	 * We need to loop because even though PostgreSQL uses flags that allow
@@ -133,7 +167,7 @@ pgunlink(const char *path)
 	 * someone else to close the file, as the caller might be holding locks
 	 * and blocking other backends.
 	 */
-	while ((S_ISLNK(st.st_mode) ? rmdir(path) : unlink(path)) < 0)
+	while ((is_lnk ? rmdir(path) : unlink(path)) < 0)
 	{
 		if (errno != EACCES)
 			return -1;
-- 
2.30.2

Reply via email to