On Sat, Jun 12, 2021 at 9:57 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Sat, Jun 12, 2021 at 1:13 PM Michael Paquier <mich...@paquier.xyz> wrote: > > > > wrasse has just failed with what looks like a timing error with a > > replication slot drop: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-06-12%2006%3A16%3A30 > > > > Here is the error: > > error running SQL: 'psql:<stdin>:1: ERROR: could not drop replication > > slot "tap_sub" on publisher: ERROR: replication slot "tap_sub" is > > active for PID 1641' > > > > It seems to me that this just lacks a poll_query_until() doing some > > slot monitoring? > > > > I think it is showing a race condition issue in the code. In > DropSubscription, we first stop the worker that is receiving the WAL, > and then in a separate connection with the publisher, it tries to drop > the slot which leads to this error. The reason is that walsender is > still active as we just wait for wal receiver (or apply worker) to > stop. Normally, as soon as the apply worker is stopped the walsender > detects it and exits but in this case, it took some time to exit, and > in the meantime, we tried to drop the slot which is still in use by > walsender.
There might be possible. That's weird since DROP SUBSCRIPTION executes DROP_REPLICATION_SLOT command with WAIT option. I found a bug that is possibly an oversight of commit 1632ea4368. The commit changed the code around the error as follows: if (active_pid != MyProcPid) { - if (behavior == SAB_Error) + if (!nowait) ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), errmsg("replication slot \"%s\" is active for PID %d", NameStr(s->data.name), active_pid))); - else if (behavior == SAB_Inquire) - return active_pid; /* Wait here until we get signaled, and then restart */ ConditionVariableSleep(&s->active_cv, The condition should be the opposite; we should raise the error when 'nowait' is true. I think this is the cause of the test failure. Even if DROP SUBSCRIPTION tries to drop the slot with the WAIT option, we don't wait but raise the error. Attached a small patch fixes it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
fix_slot_drop.patch
Description: Binary data