On Wed, Feb 16, 2022 at 01:00:53PM -0800, Nathan Bossart wrote:
> On Wed, Feb 16, 2022 at 11:27:31AM -0800, Andres Freund wrote:
>> That doesn't strike me as great architecturally. E.g. in theory the same
>> problem could exist in single user mode. I think it doesn't today, because
>> RegisterSyncRequest() will effectively "absorb" it immediately, but that kind
>> of feels like an implementation detail?
> 
> Yeah, maybe that is a reason to add an absorb somewhere within
> CreateCheckPoint() instead, like v1 [0] does.  Then the extra absorb would
> be sufficient for single-user mode if the requests were not absorbed
> immediately.

Here is a new patch.  This is essentially the same as v1, but I've spruced
up the comments and the commit message.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From adf7f7c9717897cc72a3c76d90ad0db082a7a432 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Tue, 15 Feb 2022 22:53:04 -0800
Subject: [PATCH v3 1/1] Fix race condition between DROP TABLESPACE and
 checkpointing.

Commands like ALTER TABLE SET TABLESPACE may leave files for the next
checkpoint to clean up.  If such files are not removed by the time DROP
TABLESPACE is called, we request a checkpoint so that they are deleted.
However, there is presently a window before checkpoint start where new unlink
requests won't be scheduled until the following checkpoint.  This means that
the checkpoint forced by DROP TABLESPACE might not remove the files we expect
it to remove, and the following ERROR will be emitted:

	ERROR:  tablespace "mytblspc" is not empty

To fix, add a call to AbsorbSyncRequests() just before advancing the unlink
cycle counter.  This ensures that any unlink requests forwarded prior to
checkpoint start (i.e., when ckpt_started is incremented) will be processed by
the current checkpoint.  Since AbsorbSyncRequests() performs memory
allocations, it cannot be called within a critical section, so we also need to
move SyncPreCheckpoint() to before CreateCheckPoint()'s critical section.

This is an old bug, so back-patch to all supported versions.

Author: Nathan Bossart
Reported-by: Nathan Bossart
Diagnosed-by: Thomas Munro, Nathan Bossart
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20220215235845.GA2665318%40nathanxps13
---
 src/backend/access/transam/xlog.c | 15 ++++++++-------
 src/backend/storage/sync/sync.c   | 14 +++++++++++++-
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce78ac413e..174aab5ae4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6294,6 +6294,14 @@ CreateCheckPoint(int flags)
 	MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
 	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
 
+	/*
+	 * Let smgr prepare for checkpoint; this has to happen outside the critical
+	 * section and before we determine the REDO pointer.  Note that smgr must
+	 * not do anything that'd have to be undone if we decide no checkpoint is
+	 * needed.
+	 */
+	SyncPreCheckpoint();
+
 	/*
 	 * Use a critical section to force system panic if we have trouble.
 	 */
@@ -6307,13 +6315,6 @@ CreateCheckPoint(int flags)
 		LWLockRelease(ControlFileLock);
 	}
 
-	/*
-	 * Let smgr prepare for checkpoint; this has to happen before we determine
-	 * the REDO pointer.  Note that smgr must not do anything that'd have to
-	 * be undone if we decide no checkpoint is needed.
-	 */
-	SyncPreCheckpoint();
-
 	/* Begin filling in the checkpoint WAL record */
 	MemSet(&checkPoint, 0, sizeof(checkPoint));
 	checkPoint.time = (pg_time_t) time(NULL);
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index e161d57761..f16b25faa1 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -173,7 +173,9 @@ InitSync(void)
  * counter is incremented here.
  *
  * This must be called *before* the checkpoint REDO point is determined.
- * That ensures that we won't delete files too soon.
+ * That ensures that we won't delete files too soon.  Since this calls
+ * AbsorbSyncRequests(), which performs memory allocations, it cannot be
+ * called within a critical section.
  *
  * Note that we can't do anything here that depends on the assumption
  * that the checkpoint will be completed.
@@ -181,6 +183,16 @@ InitSync(void)
 void
 SyncPreCheckpoint(void)
 {
+	/*
+	 * Operations such as DROP TABLESPACE assume that the next checkpoint will
+	 * process all recently forwarded unlink requests, but if they aren't
+	 * absorbed prior to advancing the cycle counter, they won't be processed
+	 * until a future checkpoint.  The following absorb ensures that any unlink
+	 * requests forwarded before the checkpoint began will be processed in the
+	 * current checkpoint.
+	 */
+	AbsorbSyncRequests();
+
 	/*
 	 * Any unlink requests arriving after this point will be assigned the next
 	 * cycle counter, and won't be unlinked until next checkpoint.
-- 
2.25.1

Reply via email to