On Mon, Oct 3, 2022 at 9:07 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Tue, Sep 20, 2022 at 1:31 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > I suspect that rmtree() was looping in pgunlink(), and got ENOENT, so > > didn't warn about the file itself, but then failed one moment later in > > rmdir. > > Yeah, I think this is my fault. In commit f357233c the new lstat() > call might return ENOENT for STATUS_DELETE_PENDING, and then we don't > enter pgunlink()'s 10 second sleep-retry loop. Let me think about how > best to fix that, and how to write a regression test program that > would exercise stuff like this. Might take a couple of days as I am > away from computers until mid-week.
I think something like the attached should do the right thing for STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes back to being blocking (sleep+retry until eventually we reach ENOENT or we time out and give up with EACCES), but we still distinguish it from true ENOENT so we have a fast exit in that case. This is passing CI, but not tested yet. One ugly thing in this patch is that it has to deal with our historical mistake (?) of including Windows headers in this file in Cygwin builds for no reason and thus getting WIN32 defined on a non-WIN32 build, as I've complained about before[1] but not yet tidied up. Remembering why we do any of this weird looking stuff that we don't need on Unix, the general idea is that things that scan directories to unlink everything before unlinking the parent directory need to block for a while on STATUS_DELETE_PENDING to increase their probability of success, while things that do anything else probably just want to skip such zombie files completely. To recap, we have: * readdir() sees files that are ENOENT-in-progress (so recursive unlinks can see them) * unlink() waits for ENOENT-in-progress to reach ENOENT (what broke here) * stat() and lstat() report ENOENT-in-progress as ENOENT (done to fix eg pg_basebackup, which used to fail at random on Windows) * open() reports ENOENT-in-progress as either ENOENT or EEXIST depending on O_CREAT (because used by stat()) Clearly this set of kludges isn't perfect and other kludge-sets would be possible too. One thought is that we could hide ENOENT-in-progress from readdir(), and add a new rmdir() wrapper instead. If it gets a directory-not-empty error from the kernel, it could at that point wait for zombie files to go away (perhaps registering for file system events with some local equivalent of KQ_FILTER_VNODE if there is one, to be less sloppy that the current sleep() nonsense, but sleep would work too). When I'm back at my real keyboard I'll try to come up with tests for this stuff, but I'm not sure how solid we can really make a test for this particular case -- I think you'd need to have another thread open the file and then close it after different periods of time, to demonstrate that the retry loop works but also gives up, and that's exactly the sort of timing-dependent stuff we try to avoid. But I think I'll try that anyway, because it's essential infrastructure to allow Unix-only hackers to work only this stuff. Once we have that, we might be able to make some more progress with the various FILE_DISPOSITION_POSIX_SEMANTICS proposals, if it helps, because we'll have reproducible evidence for what it really does. [1] https://commitfest.postgresql.org/39/3781/
From 0fa290b4d3597c843804d3ee35559074864424aa Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.munro@gmail.com> Date: Mon, 3 Oct 2022 09:15:02 +1300 Subject: [PATCH] Fix STATUS_DELETE_PENDING handling in pgunlink(). Commit c5cb8f3b didn't handle STATUS_DELETE_PENDING correctly, and would report ENOENT immediately without waiting. In order for simple recursive unlink algorithms to have a chance of succeeding while other backends might not have closed them yet, we rely on the 10-seconds-of-retries behavior present in pgunlink(). Diagnosed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220920013122.GA31833%40telsasoft.com --- src/port/dirmod.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/src/port/dirmod.c b/src/port/dirmod.c index ae6301dd6c..aae81228ad 100644 --- a/src/port/dirmod.c +++ b/src/port/dirmod.c @@ -39,6 +39,10 @@ #endif #endif +#if defined(WIN32) && !defined(__CYGWIN__) +#include "port/win32ntdll.h" +#endif + #if defined(WIN32) || defined(__CYGWIN__) /* @@ -91,6 +95,22 @@ pgrename(const char *from, const char *to) return 0; } +/* + * Check if _pglstat64()'s reason for failure was STATUS_DELETE_PENDING. + * This doesn't apply to Cygwin, which has its own lstat() that would report + * the case as EACCES. +*/ +static bool +lstat_error_was_status_delete_pending(void) +{ + if (errno != ENOENT) + return false; +#if defined(WIN32) && !defined(__CYGWIN__) + if (pg_RtlGetLastNtStatus() == STATUS_DELETE_PENDING) + return true; +#endif + return false; +} /* * pgunlink @@ -98,6 +118,7 @@ pgrename(const char *from, const char *to) int pgunlink(const char *path) { + bool is_lnk; int loops = 0; struct stat st; @@ -122,9 +143,22 @@ pgunlink(const char *path) * due to sharing violations, but that seems unlikely. We could perhaps * prevent that by holding a file handle ourselves across the lstat() and * the retry loop, but that seems like over-engineering for now. + * + * In the special case of a STATUS_DELETE_PENDING error (file already + * unlinked, but someone still has it open), we don't want to report ENOENT + * to the caller immediately, because rmdir(parent) would probably fail. + * We want to wait until the file truly goes away so that simple recursive + * directory unlink algorithms work. */ if (lstat(path, &st) < 0) - return -1; + { + if (lstat_error_was_status_delete_pending()) + is_lnk = false; + else + return -1; + } + else + is_lnk = S_ISLNK(st.st_mode); /* * We need to loop because even though PostgreSQL uses flags that allow @@ -133,7 +167,7 @@ pgunlink(const char *path) * someone else to close the file, as the caller might be holding locks * and blocking other backends. */ - while ((S_ISLNK(st.st_mode) ? rmdir(path) : unlink(path)) < 0) + while ((is_lnk ? rmdir(path) : unlink(path)) < 0) { if (errno != EACCES) return -1; -- 2.30.2