Hi, On 2017-03-27 16:03:48 +0800, Craig Ringer wrote: > On 27 March 2017 at 14:08, Craig Ringer <cr...@2ndquadrant.com> wrote: > > > So this patch makes ReplicationSlotAcquire check that the slot > > database matches the current database and refuse to acquire the slot > > if it does not. > > New patch attached that drops above requirement, so slots can still be > dropped from any DB. > > This introduces a narrow race window where DROP DATABASE may ERROR if > somebody connects to a different database and runs a > pg_drop_replication_slot(...) for one of the slots being dropped by > DROP DATABASE after we check for active slots but before we've dropped > the slot. But it's hard to hit and it's pretty harmless; the worst > possible result is dropping one or more of the slots before we ERROR > out of the DROP. But you clearly didn't want them anyway, since you > were dropping the DB and dropping some slots at the same time. > > I think this one's ready to go. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
> From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001 > From: Craig Ringer <cr...@2ndquadrant.com> > Date: Wed, 22 Mar 2017 13:21:09 +0800 > Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB > > Automatically drop all logical replication slots associated with a > database when the database is dropped. > --- > doc/src/sgml/func.sgml | 3 +- > doc/src/sgml/protocol.sgml | 2 + > src/backend/commands/dbcommands.c | 32 +++++--- > src/backend/replication/slot.c | 88 > ++++++++++++++++++++++ > src/include/replication/slot.h | 1 + > src/test/recovery/t/006_logical_decoding.pl | 40 +++++++++- > .../recovery/t/010_logical_decoding_timelines.pl | 30 +++++++- > 7 files changed, 182 insertions(+), 14 deletions(-) > > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index ba6f8dd..78508d7 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM > pg_walfile_name_offset(pg_stop_backup()); > <entry> > Drops the physical or logical replication slot > named <parameter>slot_name</parameter>. Same as replication protocol > - command <literal>DROP_REPLICATION_SLOT</>. > + command <literal>DROP_REPLICATION_SLOT</>. For logical slots, this > must > + be called when connected to the same database the slot was created > on. > </entry> > </row> > > diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml > index b3a5026..5f97141 100644 > --- a/doc/src/sgml/protocol.sgml > +++ b/doc/src/sgml/protocol.sgml > @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are: > <para> > Drops a replication slot, freeing any reserved server-side resources. > If > the slot is currently in use by an active connection, this command > fails. > + If the slot is a logical slot that was created in a database other than > + the database the walsender is connected to, this command fails. > </para> > <variablelist> > <varlistentry> Shouldn't the docs in the drop database section about this? > +void > +ReplicationSlotsDropDBSlots(Oid dboid) > +{ > + int i; > + > + if (max_replication_slots <= 0) > + return; > + > +restart: > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > + for (i = 0; i < max_replication_slots; i++) > + { > + ReplicationSlot *s; > + NameData slotname; > + int active_pid; > + > + s = &ReplicationSlotCtl->replication_slots[i]; > + > + /* cannot change while ReplicationSlotCtlLock is held */ > + if (!s->in_use) > + continue; > + > + /* only logical slots are database specific, skip */ > + if (!SlotIsLogical(s)) > + continue; > + > + /* not our database, skip */ > + if (s->data.database != dboid) > + continue; > + > + /* Claim the slot, as if ReplicationSlotAcquire()ing. */ > + SpinLockAcquire(&s->mutex); > + strncpy(NameStr(slotname), NameStr(s->data.name), NAMEDATALEN); > + NameStr(slotname)[NAMEDATALEN-1] = '\0'; > + active_pid = s->active_pid; > + if (active_pid == 0) > + { > + MyReplicationSlot = s; > + s->active_pid = MyProcPid; > + } > + SpinLockRelease(&s->mutex); > + > + /* > + * We might fail here if the slot was active. Even though we > hold an > + * exclusive lock on the database object a logical slot for > that DB can > + * still be active if it's being dropped by a backend connected > to > + * another DB or is otherwise acquired. > + * > + * It's an unlikely race that'll only arise from concurrent > user action, > + * so we'll just bail out. > + */ > + if (active_pid) > + elog(ERROR, "replication slot %s is in use by pid %d", > + NameStr(slotname), active_pid); > + > + /* > + * To avoid largely duplicating ReplicationSlotDropAcquired() or > + * complicating it with already_locked flags for ProcArrayLock, > + * ReplicationSlotControlLock and > ReplicationSlotAllocationLock, we > + * just release our ReplicationSlotControlLock to drop the slot. > + * > + * For safety we'll restart our scan from the beginning each > + * time we release the lock. > + */ > + LWLockRelease(ReplicationSlotControlLock); > + ReplicationSlotDropAcquired(); > + goto restart; > + } > + LWLockRelease(ReplicationSlotControlLock); > + > + /* recompute limits once after all slots are dropped */ > + ReplicationSlotsComputeRequiredXmin(false); > + ReplicationSlotsComputeRequiredLSN(); I was concerned for a second that we'd skip doing ReplicationSlotsComputeRequired* if we ERROR out above - but ReplicationSlotDropAcquired already calls these as necessary. I.e. they should be dropped from here. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers