Re: [HACKERS] Logical Replication WIP

Petr Jelinek Thu, 15 Dec 2016 01:07:05 -0800

On 13/12/16 01:33, Andres Freund wrote:
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
>>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>>>>> did will result in making it impossible to reacquire temporary slot once
>>>>>> you switched to different one in the session as the if (active_pid != 0)
>>>>>> will always be true for temp slot.
>>>>>
>>>>> I see.  I suppose it's difficult to get a test case for this.
>>>>
>>>> I created a test case, saw the error of my ways, and added your code
>>>> back in.  Patch attached.
>>>>
>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> +     /*
> +      * No need for locking as we are only interested in slots active in
> +      * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>       /*
>        * The slot is definitely gone.  Lock out concurrent scans of the array
>        * long enough to kill it.  It's OK to clear the active flag here 
> without
>        * grabbing the mutex because nobody else can be scanning the array 
> here,
>        * and nobody can be attached to this slot and thus access it without
>        * scanning the array.
>        */
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 
>


Any thoughts on attached? Yes it does repeated scans which can in theory
be slow but as I explained in the comment, in practice there is not much
need to have many temporary slots active within single session so it
should not be big issue.

I am not quite convinced that all the locking is necessary from the
current logic perspective TBH but it should help prevent mistakes by
whoever changes things in slot.c next.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

From 3ddeac5eb01da1642a6c7eb9a290a3ded08045dd Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Thu, 15 Dec 2016 09:43:17 +0100
Subject: [PATCH 2/2] Improve behavior of ReplicationSlotCleanup()

Make sure we have slot locked properly for modification everywhere and
also cleanup MyReplicationSlot so to reduce code duplication.
---
 src/backend/replication/slot.c      | 58 ++++++++++++++++++++++++++++---------
 src/backend/replication/walsender.c |  3 --
 src/backend/storage/lmgr/proc.c     |  6 +---
 src/backend/tcop/postgres.c         |  4 ---
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d8ed005..ed50e49 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -412,30 +412,60 @@ ReplicationSlotRelease(void)
 }
 
 /*
- * Cleanup all temporary slots created in current session.
+ * Search the replication slot list for temporary slot owned by current
+ * session and return it. Returns NULL if not found.
  */
-void
-ReplicationSlotCleanup()
+static ReplicationSlot *
+FindMyNextTempSlot(void)
 {
-	int			i;
-
-	Assert(MyReplicationSlot == NULL);
+	int					i;
+	ReplicationSlot	   *slot;
 
-	/*
-	 * No need for locking as we are only interested in slots active in
-	 * current process and those are not touched by other processes.
-	 */
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
 	{
-		ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+		slot = &ReplicationSlotCtl->replication_slots[i];
 
-		if (s->active_pid == MyProcPid)
+		SpinLockAcquire(&slot->mutex);
+		if (slot->active_pid == MyProcPid)
 		{
-			Assert(s->in_use && s->data.persistency == RS_TEMPORARY);
+			Assert(slot->in_use && slot->data.persistency == RS_TEMPORARY);
 
-			ReplicationSlotDropPtr(s);
+			SpinLockRelease(&slot->mutex);
+			LWLockRelease(ReplicationSlotControlLock);
+			return slot;
 		}
+		else
+			SpinLockRelease(&slot->mutex);
 	}
+	LWLockRelease(ReplicationSlotControlLock);
+
+	return NULL;
+}
+
+/*
+ * Cleanup all any slot state we might have. This includes releasing any
+ * active replication slot in current session and dropping all temporary
+ * slots created in current session.
+ */
+void
+ReplicationSlotCleanup(void)
+{
+	ReplicationSlot	   *slot;
+
+	if (MyReplicationSlot != NULL)
+		ReplicationSlotRelease();
+
+	/*
+	 * Find all temp slots and drop them. This does repeated scans of the
+	 * array so it's theoretically quadratic algorithm, but in practice
+	 * single session does not have reason to create many temporary slots
+	 * so the negative performance effects should be minimal.
+	 * If this turns out to be problematic in terms of performance we'll
+	 * need to rethink locking guarantees around the replication slots.
+	 */
+	while ((slot = FindMyNextTempSlot()) != NULL)
+		ReplicationSlotDropPtr(slot);
 }
 
 /*
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index b14d821..c90ca92 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -263,9 +263,6 @@ WalSndErrorCleanup(void)
 		sendFile = -1;
 	}
 
-	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
-
 	ReplicationSlotCleanup();
 
 	replication_active = false;
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9555f2..d84a4b4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -806,11 +806,7 @@ ProcKill(int code, Datum arg)
 	/* Cancel any pending condition variable sleep, too */
 	ConditionVariableCancelSleep();
 
-	/* Make sure active replication slots are released */
-	if (MyReplicationSlot != NULL)
-		ReplicationSlotRelease();
-
-	/* Also cleanup all the temporary slots. */
+	/* Release replication slots held by current session. */
 	ReplicationSlotCleanup();
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b179231..7213ffd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3875,10 +3875,6 @@ PostgresMain(int argc, char *argv[],
 		 * so releasing here is fine. There's another cleanup in ProcKill()
 		 * ensuring we'll correctly cleanup on FATAL errors as well.
 		 */
-		if (MyReplicationSlot != NULL)
-			ReplicationSlotRelease();
-
-		/* We also want to cleanup temporary slots on error. */
 		ReplicationSlotCleanup();
 
 		/*
-- 
2.7.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

Reply via email to