On Mon, Mar 1, 2021 at 11:07 PM Daniel Gustafsson <dan...@yesql.se> wrote: > I don't know Windows at all so I can't really comment on that portion, but > from > my understanding of procsignalbarriers I think this seems right. No tests > break when forcing the codepath to run on Linux and macOS.
Hey Daniel, Thanks for looking! > Should this be performed in tblspc_redo as well for the similar case? Ah. Yes. Added (not tested yet). > +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) > > Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than > just on Windows? That could warrant a quick word in the comment if so IMO to > avoid confusion. Note added. > -ProcessBarrierPlaceholder(void) > +ProcessBarrierSmgrRelease(void) > { > - /* > - * XXX. This is just a placeholder until the first real user of this > - * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to > - * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something > - * appropriately descriptive. Get rid of this function and instead > have > - * ProcessBarrierSomethingElse. Most likely, that function should > live in > - * the file pertaining to that subsystem, rather than here. > - * > - * The return value should be 'true' if the barrier was successfully > - * absorbed and 'false' if not. Note that returning 'false' can lead > to > - * very frequent retries, so try hard to make that an uncommon case. > - */ > + smgrrelease(); > > Should this instead be in smgr.c to avoid setting a precedent for procsignal.c > to be littered with absorption functions? Done.
From 5d5103716cbd1ac2ce5f62674ea7eb263ffcba71 Mon Sep 17 00:00:00 2001 From: Thomas Munro <tmu...@postgresql.org> Date: Mon, 1 Mar 2021 17:14:11 +1300 Subject: [PATCH v6] Use a global barrier to fix DROP TABLESPACE on Windows. Previously, it was possible for DROP TABLESPACE to fail because a table had been dropped, but other backends still had open file handles. Windows won't allow a directory containing unlinked-but-still-open files to be unlinked. Tackle this problem by forcing all backends to close all fds. No change for Unix systems, which didn't suffer from the problem, but simulate the need to do it in assertion builds just to exercise the code. Reviewed-by: Andres Freund <and...@anarazel.de> Reviewed-by: Daniel Gustafsson <dan...@yesql.se> Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com --- src/backend/commands/tablespace.c | 32 ++++++++++++++++++++++------ src/backend/storage/ipc/procsignal.c | 24 +++------------------ src/backend/storage/smgr/md.c | 6 ++++++ src/backend/storage/smgr/smgr.c | 18 ++++++++++++++++ src/include/storage/md.h | 1 + src/include/storage/procsignal.h | 7 +----- src/include/storage/smgr.h | 1 + 7 files changed, 55 insertions(+), 34 deletions(-) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 69ea155d50..38894c5123 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -520,15 +520,25 @@ DropTableSpace(DropTableSpaceStmt *stmt) * but we can't tell them apart from important data files that we * mustn't delete. So instead, we force a checkpoint which will clean * out any lingering files, and try again. - * - * XXX On Windows, an unlinked file persists in the directory listing - * until no process retains an open handle for the file. The DDL - * commands that schedule files for unlink send invalidation messages - * directing other PostgreSQL processes to close the files. DROP - * TABLESPACE should not give up on the tablespace becoming empty - * until all relevant invalidation processing is complete. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); + /* + * On Windows, an unlinked file persists in the directory listing until + * no process retains an open handle for the file. The DDL + * commands that schedule files for unlink send invalidation messages + * directing other PostgreSQL processes to close the files, but nothing + * guarantees they'll be processed in time. So, we'll also use a + * global barrier to ask all backends to close all files, and wait + * until they're finished. + * + * For test coverage, also do this on Unix systems in assert builds. + */ +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) + LWLockRelease(TablespaceCreateLock); + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); + LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE); +#endif + /* And now try again. */ if (!destroy_tablespace_directories(tablespaceoid, false)) { /* Still not empty, the files must be important then */ @@ -1550,6 +1560,14 @@ tblspc_redo(XLogReaderState *record) */ if (!destroy_tablespace_directories(xlrec->ts_id, true)) { +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) + /* + * Windows only (and assertion builds for testing purposes): ask + * all backends to close all files (see DropTableSpace). + */ + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); +#endif + ResolveRecoveryConflictWithTablespace(xlrec->ts_id); /* diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index 8e5ee49fbd..9fcc2b09e1 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -28,6 +28,7 @@ #include "storage/latch.h" #include "storage/proc.h" #include "storage/shmem.h" +#include "storage/smgr.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" @@ -100,7 +101,6 @@ static ProcSignalSlot *MyProcSignalSlot = NULL; static bool CheckProcSignal(ProcSignalReason reason); static void CleanupProcSignalState(int status, Datum arg); static void ResetProcSignalBarrierBits(uint32 flags); -static bool ProcessBarrierPlaceholder(void); /* * ProcSignalShmemSize @@ -531,8 +531,8 @@ ProcessProcSignalBarrier(void) type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags); switch (type) { - case PROCSIGNAL_BARRIER_PLACEHOLDER: - processed = ProcessBarrierPlaceholder(); + case PROCSIGNAL_BARRIER_SMGRRELEASE: + processed = ProcessBarrierSmgrRelease(); break; } @@ -598,24 +598,6 @@ ResetProcSignalBarrierBits(uint32 flags) InterruptPending = true; } -static bool -ProcessBarrierPlaceholder(void) -{ - /* - * XXX. This is just a placeholder until the first real user of this - * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to - * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something - * appropriately descriptive. Get rid of this function and instead have - * ProcessBarrierSomethingElse. Most likely, that function should live in - * the file pertaining to that subsystem, rather than here. - * - * The return value should be 'true' if the barrier was successfully - * absorbed and 'false' if not. Note that returning 'false' can lead to - * very frequent retries, so try hard to make that an uncommon case. - */ - return true; -} - /* * CheckProcSignal - check to see if a particular reason has been * signaled, and clear the signal flag. Should be called after receiving diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 1e12cfad8e..1b751205b1 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -548,6 +548,12 @@ mdclose(SMgrRelation reln, ForkNumber forknum) } } +void +mdrelease(void) +{ + closeAllVfds(); +} + /* * mdprefetch() -- Initiate asynchronous read of the specified block of a relation */ diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index 4dc24649df..91cf3cef48 100644 --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -41,6 +41,7 @@ typedef struct f_smgr { void (*smgr_init) (void); /* may be NULL */ void (*smgr_shutdown) (void); /* may be NULL */ + void (*smgr_release) (void); /* may be NULL */ void (*smgr_open) (SMgrRelation reln); void (*smgr_close) (SMgrRelation reln, ForkNumber forknum); void (*smgr_create) (SMgrRelation reln, ForkNumber forknum, @@ -69,6 +70,7 @@ static const f_smgr smgrsw[] = { { .smgr_init = mdinit, .smgr_shutdown = NULL, + .smgr_release = mdrelease, .smgr_open = mdopen, .smgr_close = mdclose, .smgr_create = mdcreate, @@ -693,3 +695,19 @@ AtEOXact_SMgr(void) smgrclose(rel); } } + +/* + * This routine is called when we are ordered to release all open files by a + * ProcSignalBarrier. + */ +bool +ProcessBarrierSmgrRelease(void) +{ + for (int i = 0; i < NSmgr; i++) + { + if (smgrsw[i].smgr_release) + smgrsw[i].smgr_release(); + } + + return true; +} diff --git a/src/include/storage/md.h b/src/include/storage/md.h index 752b440864..7685367b97 100644 --- a/src/include/storage/md.h +++ b/src/include/storage/md.h @@ -23,6 +23,7 @@ extern void mdinit(void); extern void mdopen(SMgrRelation reln); extern void mdclose(SMgrRelation reln, ForkNumber forknum); +extern void mdrelease(void); extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo); extern bool mdexists(SMgrRelation reln, ForkNumber forknum); extern void mdunlink(RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo); diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h index 4ae7dc33b8..aaaa8ed07c 100644 --- a/src/include/storage/procsignal.h +++ b/src/include/storage/procsignal.h @@ -48,12 +48,7 @@ typedef enum typedef enum { - /* - * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first - * real user of the ProcSignalBarrier mechanism is added. It's just here - * for now because we can't have an empty enum. - */ - PROCSIGNAL_BARRIER_PLACEHOLDER = 0 + PROCSIGNAL_BARRIER_SMGRRELEASE /* ask smgr to close files */ } ProcSignalBarrierType; /* diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h index a6fbf7b6a6..8da5139b03 100644 --- a/src/include/storage/smgr.h +++ b/src/include/storage/smgr.h @@ -104,5 +104,6 @@ extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum, int nforks, BlockNumber *nblocks); extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum); extern void AtEOXact_SMgr(void); +extern bool ProcessBarrierSmgrRelease(void); #endif /* SMGR_H */ -- 2.30.1