Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

Thomas Munro Wed, 03 Mar 2021 14:20:56 -0800

On Thu, Mar 4, 2021 at 4:18 AM Daniel Gustafsson <dan...@yesql.se> wrote:
> > On 1 Mar 2021, at 12:54, Thomas Munro <thomas.mu...@gmail.com> wrote:
> Based on my (limited) experience with procsignalbarriers I think this patch is


Help wanted: must have at least 14 years experience with
ProcSignalBarrier!  Yeah, I'm still figuring out the programming rules
here myself...

> correct; the general rule-of-thumb of synchronizing backend state on barrier
> absorption doesn't really apply in this case, literally all we want is to know
> that we've hit one interrupt and performed removals.

I guess the way to think about it is that the desired state is "you
have no files open that have been unlinked".

> >> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> >>
> >> Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
> >> just on Windows?  That could warrant a quick word in the comment if so IMO 
> >> to
> >> avoid confusion.
> >
> > Note added.
>
> Since there is no way to get make the first destroy_tablespace_directories 
> call
> fail on purpose in testing, the assertion coverage may have limited use 
> though?

There is: all you have to do is drop a table, and then drop the
tablespace that held it without a checkpoint in between.  That
scenario is exercised by the "tablespace" regression test, and you can
reach it manually like this on a Unix system, with assertions enabled.
On a Windows box, I believe it should be reached even if there was a
checkpoint in between (or maybe you need to have a second session that
has accessed the table, not sure, no actual Windows here I just fling
stuff at CI).  I've added an elog() message to show the handler
running in each process in my cluster, so you can see it (it's also
instructive to put a sleep in there):

My psql session:

  postgres=# create tablespace ts location '/tmp/ts';
  CREATE TABLESPACE
  postgres=# create table t () tablespace ts;
  CREATE TABLE
  postgres=# drop table t;
  DROP TABLE
  postgres=# drop tablespace ts;

At this point the log shows:

  2021-03-04 09:54:33.429 NZDT [239811] LOG:  ProcessBarrierSmgrRelease()
  2021-03-04 09:54:33.429 NZDT [239821] LOG:  ProcessBarrierSmgrRelease()
  2021-03-04 09:54:33.429 NZDT [239821] STATEMENT:  drop tablespace ts;
  2021-03-04 09:54:33.429 NZDT [239814] LOG:  ProcessBarrierSmgrRelease()
  2021-03-04 09:54:33.429 NZDT [239816] LOG:  ProcessBarrierSmgrRelease()
  2021-03-04 09:54:33.429 NZDT [239812] LOG:  ProcessBarrierSmgrRelease()
  2021-03-04 09:54:33.429 NZDT [239813] LOG:  ProcessBarrierSmgrRelease()

Now back to my session:

  DROP TABLESPACE
  postgres=#

> I don't have a Windows env handy right now, but everything works as expected
> when testing this on Linux and macOS by inducing the codepath.  Will try to do
> some testing in Windows as well.

Thanks!

One question on my mind is: since this wait is interruptible (if you
get sick of waiting for a slow-to-respond process you can hit ^C, or
statement_timeout can presumably do it for you), do we leave things in
a sane state on error (catalog changes rolled back, no damage done on
disk)?  There is actually a nasty race there already ("If we crash
before committing..."), and we need to make sure we don't make that
window wider.  One thing I am pretty sure of is that it's never OK to
wait for a ProcSignalBarrier when you're not interruptible; for one
thing, you won't process the request yourself (self deadlock) and for
another, it would be hypocritical of you to expect others to process
interrupts when you can't (interprocess deadlock); perhaps there
should be an assertion about that, but it's pretty obvious if you
screw that up: it hangs.  That's why I release and reacquire that
LWLock.  But does that break some logic?

Andres just pointed me at the following CI failure on the AIO branch,
which seems to be due to a variant of this problem involving DROP
DATABASE.

https://cirrus-ci.com/task/6730034573475840?command=windows_worker_buf#L7

Duh, of course, we need the same thing in that case, and also in its
redo routine.

And... the same problem must also exist for the closely related ALTER
DATABASE ... SET TABLESPACE.  I guess these cases are pretty unlikely
to fail without the AIO branch's funky "io worker" processes that love
hoarding file descriptors, but I suppose it must be possible for the
bgwriter to have a relevant file descriptor open at the wrong time on
master today.

One thing I haven't tried to do yet is improve the "pipelining" by
issuing the request sooner, in the cases where we do this stuff
unconditionally.

From 33a4e7173f4a71d5739a608193175184984e9da7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmu...@postgresql.org>
Date: Mon, 1 Mar 2021 17:14:11 +1300
Subject: [PATCH v7] Fix DROP {DATABASE,TABLESPACE} on Windows.

Previously, it was possible for DROP DATABASE and DROP TABLESPACE to
fail because other backends still had open file handles.  Windows won't
allow a directory containing unlinked-but-still-open files to be
unlinked.  Tackle this problem by forcing all backends to close all fds
at appropriate times.  No change for Unix systems, which didn't suffer
from the problem, but simulate the need to do it in assertion builds
just to exercise the code.

Reviewed-by: Andres Freund <and...@anarazel.de>
Reviewed-by: Daniel Gustafsson <dan...@yesql.se>
Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com
---
 src/backend/commands/dbcommands.c    | 19 +++++++++++++++---
 src/backend/commands/tablespace.c    | 29 +++++++++++++++++++++-------
 src/backend/storage/ipc/procsignal.c | 24 +++--------------------
 src/backend/storage/smgr/md.c        |  6 ++++++
 src/backend/storage/smgr/smgr.c      | 18 +++++++++++++++++
 src/include/storage/md.h             |  1 +
 src/include/storage/procsignal.h     |  7 +------
 src/include/storage/smgr.h           |  1 +
 8 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2b159b60eb..d533558fa3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -985,12 +985,15 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 
 	/*
 	 * Force a checkpoint to make sure the checkpointer has received the
-	 * message sent by ForgetDatabaseSyncRequests. On Windows, this also
-	 * ensures that background procs don't hold any open files, which would
-	 * cause rmdir() to fail.
+	 * message sent by ForgetDatabaseSyncRequests.
 	 */
 	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
 
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+	/* Close all fds in other Windows processes. */
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+#endif
+
 	/*
 	 * Remove all tablespace subdirs belonging to the database.
 	 */
@@ -1239,6 +1242,11 @@ movedb(const char *dbname, const char *tblspcname)
 	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT
 					  | CHECKPOINT_FLUSH_ALL);
 
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+	/* Close all fds in other Windows processes. */
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+#endif
+
 	/*
 	 * Now drop all buffers holding data of the target database; they should
 	 * no longer be dirty so DropDatabaseBuffers is safe.
@@ -2251,6 +2259,11 @@ dbase_redo(XLogReaderState *record)
 		/* Clean out the xlog relcache too */
 		XLogDropDatabase(xlrec->db_id);
 
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+		/* Close all fds in other Windows processes. */
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+#endif
+
 		for (i = 0; i < xlrec->ntablespaces; i++)
 		{
 			dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 69ea155d50..2ce8966af0 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -520,15 +520,25 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 		 * but we can't tell them apart from important data files that we
 		 * mustn't delete.  So instead, we force a checkpoint which will clean
 		 * out any lingering files, and try again.
-		 *
-		 * XXX On Windows, an unlinked file persists in the directory listing
-		 * until no process retains an open handle for the file.  The DDL
-		 * commands that schedule files for unlink send invalidation messages
-		 * directing other PostgreSQL processes to close the files.  DROP
-		 * TABLESPACE should not give up on the tablespace becoming empty
-		 * until all relevant invalidation processing is complete.
 		 */
 		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
+		/*
+		 * On Windows, an unlinked file persists in the directory listing until
+		 * no process retains an open handle for the file.  The DDL
+		 * commands that schedule files for unlink send invalidation messages
+		 * directing other PostgreSQL processes to close the files, but nothing
+		 * guarantees they'll be processed in time.  So, we'll also use a
+		 * global barrier to ask all backends to close all files, and wait
+		 * until they're finished.
+		 *
+		 * For test coverage, also do this on Unix systems in assert builds.
+		 */
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+		LWLockRelease(TablespaceCreateLock);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+		LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
+#endif
+		/* And now try again. */
 		if (!destroy_tablespace_directories(tablespaceoid, false))
 		{
 			/* Still not empty, the files must be important then */
@@ -1550,6 +1560,11 @@ tblspc_redo(XLogReaderState *record)
 		 */
 		if (!destroy_tablespace_directories(xlrec->ts_id, true))
 		{
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+			/* Close fds in other Windows processes. */
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+#endif
+
 			ResolveRecoveryConflictWithTablespace(xlrec->ts_id);
 
 			/*
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..dd27f1bb8f 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -28,6 +28,7 @@
 #include "storage/latch.h"
 #include "storage/proc.h"
 #include "storage/shmem.h"
+#include "storage/smgr.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 
@@ -100,7 +101,6 @@ static ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -526,8 +526,8 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_SMGRRELEASE:
+						processed = ProcessBarrierSmgrRelease();
 						break;
 				}
 
@@ -593,24 +593,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1e12cfad8e..1b751205b1 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -548,6 +548,12 @@ mdclose(SMgrRelation reln, ForkNumber forknum)
 	}
 }
 
+void
+mdrelease(void)
+{
+	closeAllVfds();
+}
+
 /*
  *	mdprefetch() -- Initiate asynchronous read of the specified block of a relation
  */
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 4dc24649df..91cf3cef48 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -41,6 +41,7 @@ typedef struct f_smgr
 {
 	void		(*smgr_init) (void);	/* may be NULL */
 	void		(*smgr_shutdown) (void);	/* may be NULL */
+	void		(*smgr_release) (void);	/* may be NULL */
 	void		(*smgr_open) (SMgrRelation reln);
 	void		(*smgr_close) (SMgrRelation reln, ForkNumber forknum);
 	void		(*smgr_create) (SMgrRelation reln, ForkNumber forknum,
@@ -69,6 +70,7 @@ static const f_smgr smgrsw[] = {
 	{
 		.smgr_init = mdinit,
 		.smgr_shutdown = NULL,
+		.smgr_release = mdrelease,
 		.smgr_open = mdopen,
 		.smgr_close = mdclose,
 		.smgr_create = mdcreate,
@@ -693,3 +695,19 @@ AtEOXact_SMgr(void)
 		smgrclose(rel);
 	}
 }
+
+/*
+ * This routine is called when we are ordered to release all open files by a
+ * ProcSignalBarrier.
+ */
+bool
+ProcessBarrierSmgrRelease(void)
+{
+	for (int i = 0; i < NSmgr; i++)
+	{
+		if (smgrsw[i].smgr_release)
+			smgrsw[i].smgr_release();
+	}
+
+	return true;
+}
diff --git a/src/include/storage/md.h b/src/include/storage/md.h
index 752b440864..7685367b97 100644
--- a/src/include/storage/md.h
+++ b/src/include/storage/md.h
@@ -23,6 +23,7 @@
 extern void mdinit(void);
 extern void mdopen(SMgrRelation reln);
 extern void mdclose(SMgrRelation reln, ForkNumber forknum);
+extern void mdrelease(void);
 extern void mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern bool mdexists(SMgrRelation reln, ForkNumber forknum);
 extern void mdunlink(RelFileNodeBackend rnode, ForkNumber forknum, bool isRedo);
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..aaaa8ed07c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,7 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_SMGRRELEASE	/* ask smgr to close files */
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index a6fbf7b6a6..8da5139b03 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -104,5 +104,6 @@ extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
 						 int nforks, BlockNumber *nblocks);
 extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
 extern void AtEOXact_SMgr(void);
+extern bool ProcessBarrierSmgrRelease(void);
 
 #endif							/* SMGR_H */
-- 
2.30.1

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

Reply via email to