On 2021-Apr-07, Andres Freund wrote:

> I'm also confused by the use of ConditionVariableTimedSleep(timeout =
> 10). Why do we need a timed sleep here in the first place? And why with
> such a short sleep?

I was scared of the possibility that a process would not set the CV for
whatever reason, causing checkpointing to become stuck.  Maybe that's
misguided thinking if CVs are reliable enough.

> I also noticed that the code is careful to use CHECK_FOR_INTERRUPTS(); -
> but is aware it's running in checkpointer. I don't think CFI does much
> there? If we are worried about needing to check for interrupts, more
> work is needed.

Hmm .. yeah, doing CFI seems pretty useless.  I think that should just
be removed.  If checkpointer gets USR2 (request for shutdown) it's not
going to affect the behavior of CFI anyway.

I attach a couple of changes to your 0001.  It's all cosmetic; what
looks not so cosmetic is the change of "continue" to "break" in helper
routine; if !s->in_use, we'd loop infinitely.  The other routine
already checks that before calling the helper; since you hold
ReplicationSlotControlLock at that point, it should not be possible to
drop it in between.  Anyway, it's a trivial change to make, so it should
be correct.

I also added a "continue" at the bottom of one block; currently that
doesn't change any behavior, but if we add code at the other block, it
might not be what's intended.

> After this I don't see a reason to have SAB_Inquire - as far as I can
> tell it's practically impossible to use without race conditions? Except
> for raising an error - which is "builtin"...

Hmm, interesting ... If not needed, yeah let's get rid of that.


Are you getting this set pushed, or would you like me to handle it?
(There seems to be some minor conflict in 13)

-- 
Álvaro Herrera       Valdivia, Chile
"Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
>From 7f31b0ec12e52b6c967047384353895538161840 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Thu, 29 Apr 2021 13:19:54 -0400
Subject: [PATCH] Alvaro's edits

---
 src/backend/replication/slot.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d28330cbd8..cd6f75b3e9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1172,19 +1172,17 @@ InvalidateObsoleteReplicationSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
 
 	while (true)
 	{
-		XLogRecPtr	restart_lsn = InvalidXLogRecPtr;
+		XLogRecPtr	restart_lsn;
 		bool		slot_conflicts;
 		NameData	slotname;
 		int			active_pid = 0;
 
 		Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
 
-		CHECK_FOR_INTERRUPTS();
-
 		slot_conflicts = false;
 
 		if (!s->in_use)
-			continue;
+			break;
 
 		/*
 		 * Check if the slot needs to be invalidated. If it needs to be
@@ -1205,12 +1203,16 @@ InvalidateObsoleteReplicationSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
 			slotname = s->data.name;
 			active_pid = s->active_pid;
 
-			/* check if we can acquire it */
+			/*
+			 * If the slot can be acquired, do so and mark it invalidated
+			 * immediately.  Otherwise we'll signal the owning process, below,
+			 * and retry.
+			 */
 			if (active_pid == 0)
 			{
 				MyReplicationSlot = s;
 				s->active_pid = MyProcPid;
-				s->data.invalidated_at = s->data.restart_lsn;
+				s->data.invalidated_at = restart_lsn;
 				s->data.restart_lsn = InvalidXLogRecPtr;
 			}
 		}
@@ -1262,6 +1264,7 @@ InvalidateObsoleteReplicationSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
 
 			/* re-acquire for next loop iteration */
 			LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+			continue;
 		}
 		else
 		{
@@ -1286,7 +1289,6 @@ InvalidateObsoleteReplicationSlot(ReplicationSlot *s, XLogRecPtr oldestLSN)
 
 			break;
 		}
-
 	}
 
 	Assert(!released_lock == LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
@@ -1313,8 +1315,6 @@ restart:
 	{
 		ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
 
-		CHECK_FOR_INTERRUPTS();
-
 		if (!s->in_use)
 			continue;
 
-- 
2.20.1

Reply via email to