On 13/12/16 01:33, Andres Freund wrote: > > On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote: >> On 12/8/16 4:10 PM, Petr Jelinek wrote: >>> On 08/12/16 20:16, Peter Eisentraut wrote: >>>> On 12/6/16 11:58 AM, Peter Eisentraut wrote: >>>>> On 12/5/16 6:24 PM, Petr Jelinek wrote: >>>>>> I think that the removal of changes to ReplicationSlotAcquire() that you >>>>>> did will result in making it impossible to reacquire temporary slot once >>>>>> you switched to different one in the session as the if (active_pid != 0) >>>>>> will always be true for temp slot. >>>>> >>>>> I see. I suppose it's difficult to get a test case for this. >>>> >>>> I created a test case, saw the error of my ways, and added your code >>>> back in. Patch attached. >>>> >>> >>> Hi, >>> >>> I am happy with this version, thanks for moving it forward. >> >> committed > > Hm. > > /* > + * Cleanup all temporary slots created in current session. > + */ > +void > +ReplicationSlotCleanup() > > I'd rather see a (void) there. The prototype has it, but still. > > > + > + /* > + * No need for locking as we are only interested in slots active in > + * current process and those are not touched by other processes. > > I'm a bit suspicious of this claim. Without a memory barrier you could > actually look at outdated versions of active_pid. In practice there's > enough full memory barriers in the slot creation code that it's > guaranteed to not be the same pid from before a wraparound though. > > I think that doing iterations of slots without > ReplicationSlotControlLock makes things more fragile, because suddenly > assumptions that previously held aren't true anymore. E.g. factually > /* > * The slot is definitely gone. Lock out concurrent scans of the array > * long enough to kill it. It's OK to clear the active flag here > without > * grabbing the mutex because nobody else can be scanning the array > here, > * and nobody can be attached to this slot and thus access it without > * scanning the array. > */ > is now simply not true anymore. It's probably not harmfully broken, but > at least you've changed the locking protocol without adapting comments. > >
Any thoughts on attached? Yes it does repeated scans which can in theory be slow but as I explained in the comment, in practice there is not much need to have many temporary slots active within single session so it should not be big issue. I am not quite convinced that all the locking is necessary from the current logic perspective TBH but it should help prevent mistakes by whoever changes things in slot.c next. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
From 3ddeac5eb01da1642a6c7eb9a290a3ded08045dd Mon Sep 17 00:00:00 2001 From: Petr Jelinek <pjmo...@pjmodos.net> Date: Thu, 15 Dec 2016 09:43:17 +0100 Subject: [PATCH 2/2] Improve behavior of ReplicationSlotCleanup() Make sure we have slot locked properly for modification everywhere and also cleanup MyReplicationSlot so to reduce code duplication. --- src/backend/replication/slot.c | 58 ++++++++++++++++++++++++++++--------- src/backend/replication/walsender.c | 3 -- src/backend/storage/lmgr/proc.c | 6 +--- src/backend/tcop/postgres.c | 4 --- 4 files changed, 45 insertions(+), 26 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index d8ed005..ed50e49 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -412,30 +412,60 @@ ReplicationSlotRelease(void) } /* - * Cleanup all temporary slots created in current session. + * Search the replication slot list for temporary slot owned by current + * session and return it. Returns NULL if not found. */ -void -ReplicationSlotCleanup() +static ReplicationSlot * +FindMyNextTempSlot(void) { - int i; - - Assert(MyReplicationSlot == NULL); + int i; + ReplicationSlot *slot; - /* - * No need for locking as we are only interested in slots active in - * current process and those are not touched by other processes. - */ + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); for (i = 0; i < max_replication_slots; i++) { - ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i]; + slot = &ReplicationSlotCtl->replication_slots[i]; - if (s->active_pid == MyProcPid) + SpinLockAcquire(&slot->mutex); + if (slot->active_pid == MyProcPid) { - Assert(s->in_use && s->data.persistency == RS_TEMPORARY); + Assert(slot->in_use && slot->data.persistency == RS_TEMPORARY); - ReplicationSlotDropPtr(s); + SpinLockRelease(&slot->mutex); + LWLockRelease(ReplicationSlotControlLock); + return slot; } + else + SpinLockRelease(&slot->mutex); } + LWLockRelease(ReplicationSlotControlLock); + + return NULL; +} + +/* + * Cleanup all any slot state we might have. This includes releasing any + * active replication slot in current session and dropping all temporary + * slots created in current session. + */ +void +ReplicationSlotCleanup(void) +{ + ReplicationSlot *slot; + + if (MyReplicationSlot != NULL) + ReplicationSlotRelease(); + + /* + * Find all temp slots and drop them. This does repeated scans of the + * array so it's theoretically quadratic algorithm, but in practice + * single session does not have reason to create many temporary slots + * so the negative performance effects should be minimal. + * If this turns out to be problematic in terms of performance we'll + * need to rethink locking guarantees around the replication slots. + */ + while ((slot = FindMyNextTempSlot()) != NULL) + ReplicationSlotDropPtr(slot); } /* diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index b14d821..c90ca92 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -263,9 +263,6 @@ WalSndErrorCleanup(void) sendFile = -1; } - if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); - ReplicationSlotCleanup(); replication_active = false; diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index e9555f2..d84a4b4 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -806,11 +806,7 @@ ProcKill(int code, Datum arg) /* Cancel any pending condition variable sleep, too */ ConditionVariableCancelSleep(); - /* Make sure active replication slots are released */ - if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); - - /* Also cleanup all the temporary slots. */ + /* Release replication slots held by current session. */ ReplicationSlotCleanup(); /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index b179231..7213ffd 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3875,10 +3875,6 @@ PostgresMain(int argc, char *argv[], * so releasing here is fine. There's another cleanup in ProcKill() * ensuring we'll correctly cleanup on FATAL errors as well. */ - if (MyReplicationSlot != NULL) - ReplicationSlotRelease(); - - /* We also want to cleanup temporary slots on error. */ ReplicationSlotCleanup(); /* -- 2.7.4
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers