Hi,

On 2017-03-27 16:03:48 +0800, Craig Ringer wrote:
> On 27 March 2017 at 14:08, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> > So this patch makes ReplicationSlotAcquire check that the slot
> > database matches the current database and refuse to acquire the slot
> > if it does not.
> 
> New patch attached that drops above requirement, so slots can still be
> dropped from any DB.
> 
> This introduces a narrow race window where DROP DATABASE may ERROR if
> somebody connects to a different database and runs a
> pg_drop_replication_slot(...) for one of the slots being dropped by
> DROP DATABASE after we check for active slots but before we've dropped
> the slot. But it's hard to hit and it's pretty harmless; the worst
> possible result is dropping one or more of the slots before we ERROR
> out of the DROP. But you clearly didn't want them anyway, since you
> were dropping the DB and dropping some slots at the same time.
> 
> I think this one's ready to go.
> 
> -- 
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

> From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001
> From: Craig Ringer <cr...@2ndquadrant.com>
> Date: Wed, 22 Mar 2017 13:21:09 +0800
> Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB
> 
> Automatically drop all logical replication slots associated with a
> database when the database is dropped.
> ---
>  doc/src/sgml/func.sgml                             |  3 +-
>  doc/src/sgml/protocol.sgml                         |  2 +
>  src/backend/commands/dbcommands.c                  | 32 +++++---
>  src/backend/replication/slot.c                     | 88 
> ++++++++++++++++++++++
>  src/include/replication/slot.h                     |  1 +
>  src/test/recovery/t/006_logical_decoding.pl        | 40 +++++++++-
>  .../recovery/t/010_logical_decoding_timelines.pl   | 30 +++++++-
>  7 files changed, 182 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index ba6f8dd..78508d7 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM 
> pg_walfile_name_offset(pg_stop_backup());
>         <entry>
>          Drops the physical or logical replication slot
>          named <parameter>slot_name</parameter>. Same as replication protocol
> -        command <literal>DROP_REPLICATION_SLOT</>.
> +        command <literal>DROP_REPLICATION_SLOT</>. For logical slots, this 
> must
> +        be called when connected to the same database the slot was created 
> on.
>         </entry>
>        </row>
>  
> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index b3a5026..5f97141 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
>       <para>
>        Drops a replication slot, freeing any reserved server-side resources. 
> If
>        the slot is currently in use by an active connection, this command 
> fails.
> +      If the slot is a logical slot that was created in a database other than
> +      the database the walsender is connected to, this command fails.
>       </para>
>       <variablelist>
>        <varlistentry>

Shouldn't the docs in the drop database section about this?


> +void
> +ReplicationSlotsDropDBSlots(Oid dboid)
> +{
> +     int                     i;
> +
> +     if (max_replication_slots <= 0)
> +             return;
> +
> +restart:
> +     LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +     for (i = 0; i < max_replication_slots; i++)
> +     {
> +             ReplicationSlot *s;
> +             NameData slotname;
> +             int active_pid;
> +
> +             s = &ReplicationSlotCtl->replication_slots[i];
> +
> +             /* cannot change while ReplicationSlotCtlLock is held */
> +             if (!s->in_use)
> +                     continue;
> +
> +             /* only logical slots are database specific, skip */
> +             if (!SlotIsLogical(s))
> +                     continue;
> +
> +             /* not our database, skip */
> +             if (s->data.database != dboid)
> +                     continue;
> +
> +             /* Claim the slot, as if ReplicationSlotAcquire()ing. */
> +             SpinLockAcquire(&s->mutex);
> +             strncpy(NameStr(slotname), NameStr(s->data.name), NAMEDATALEN);
> +             NameStr(slotname)[NAMEDATALEN-1] = '\0';
> +             active_pid = s->active_pid;
> +             if (active_pid == 0)
> +             {
> +                     MyReplicationSlot = s;
> +                     s->active_pid = MyProcPid;
> +             }
> +             SpinLockRelease(&s->mutex);
> +
> +             /*
> +              * We might fail here if the slot was active. Even though we 
> hold an
> +              * exclusive lock on the database object a logical slot for 
> that DB can
> +              * still be active if it's being dropped by a backend connected 
> to
> +              * another DB or is otherwise acquired.
> +              *
> +              * It's an unlikely race that'll only arise from concurrent 
> user action,
> +              * so we'll just bail out.
> +              */
> +             if (active_pid)
> +                     elog(ERROR, "replication slot %s is in use by pid %d",
> +                              NameStr(slotname), active_pid);
> +
> +             /*
> +              * To avoid largely duplicating ReplicationSlotDropAcquired() or
> +              * complicating it with already_locked flags for ProcArrayLock,
> +              * ReplicationSlotControlLock and 
> ReplicationSlotAllocationLock, we
> +              * just release our ReplicationSlotControlLock to drop the slot.
> +              *
> +              * For safety we'll restart our scan from the beginning each
> +              * time we release the lock.
> +              */
> +             LWLockRelease(ReplicationSlotControlLock);
> +             ReplicationSlotDropAcquired();
> +             goto restart;
> +     }
> +     LWLockRelease(ReplicationSlotControlLock);
> +
> +     /* recompute limits once after all slots are dropped */
> +     ReplicationSlotsComputeRequiredXmin(false);
> +     ReplicationSlotsComputeRequiredLSN();

I was concerned for a second that we'd skip doing
ReplicationSlotsComputeRequired* if we ERROR out above - but
ReplicationSlotDropAcquired already calls these as necessary. I.e. they
should be dropped from here.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to