Hello hackers, On Wed, Jul 14, 2021 at 11:30 AM Sergey Fedchenko <seregay...@bk.ru> wrote: > > Hi all! I still can reproduce it on 14beta1 version. I adapted a patch I > found in this thread > https://github.com/seregayoga/postgres/commit/338bc33f2cf77edde7c45bfdfb9f39a92ec57eb8 > . It solved this bug for me (tested with simple Go program using > https://github.com/lib/pq). It would be nice to have this bug fixed. I’m not > so familiar with postgres code base, but would glad to help with testing.
In our company in one of our projects we ran into this bug too. I attached the patch which fixes it in a different way. It calls SignalBackends() in AtCommit_Notify(). It is possible to call SignalBackends() outside of ProcessCompletedNotifies() after the commit 51004c7172b5c35afac050f4d5849839a06e8d3b, which removes necessity of checking of SignalBackends()'s result. Moving SignalBackends() to AtCommit_Notify() makes it possible to signal backends by "non-normal" backends including logical replication workers. It seems it is safe to call SignalBackends() in AtCommit_Notify() since SignalBackends() doesn't raise any error except OOM. Regarding Andres concern: > what I am wondering is what happens if there's a background worker (like > the replication worker, but it could be other things too) that queues > notifications, but no normal backends are actually listening. As far as > I can tell, in that case we'd continue to queue stuff into the slru, but > wouldn't actually clean things up until a normal session gets around to > it? Which might be a while, on e.g. a logical replica. A backend which raises notifications calls asyncQueueAdvanceTail() sometimes, which truncates pages up to new tail, which is equal to head if there are no listening backends. But there will be a problem if there is a backend which is listening but it doesn't process incoming notifications and doesn't update its queue position. In that case asyncQueueAdvanceTail() is able to advance tail only up to that backend's queue position. -- Artur Zakirov PostgreSQL Developer at Adjust
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 4b16fb5682..9093af42bd 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -69,16 +69,18 @@ * * After commit we are called another time (AtCommit_Notify()). Here we * make the actual updates to the effective listen state (listenChannels). + * Also we check if we need to signal listening backends. In + * SignalBackends() we scan the list of listening backends and send a + * PROCSIG_NOTIFY_INTERRUPT signal to every listening backend (we don't know + * which backend is listening on which channel so we must signal them all). + * We can exclude backends that are already up to date, though, and we can + * also exclude backends that are in other databases (unless they are way + * behind and should be kicked to make them advance their pointers). + * We don't bother with a self-signal either. * * Finally, after we are out of the transaction altogether, we check if - * we need to signal listening backends. In SignalBackends() we scan the - * list of listening backends and send a PROCSIG_NOTIFY_INTERRUPT signal - * to every listening backend (we don't know which backend is listening on - * which channel so we must signal them all). We can exclude backends that - * are already up to date, though, and we can also exclude backends that - * are in other databases (unless they are way behind and should be kicked - * to make them advance their pointers). We don't bother with a - * self-signal either, but just process the queue directly. + * we need to send out self-notifies. In ProcessCompletedNotifies() we just + * process the queue directly. * * 5. Upon receipt of a PROCSIG_NOTIFY_INTERRUPT signal, the signal handler * sets the process's latch, which triggers the event to be processed @@ -984,7 +986,9 @@ PreCommit_Notify(void) * * This is called at transaction commit, after committing to clog. * - * Update listenChannels and clear transaction-local state. + * Update listenChannels and clear transaction-local state. If we issued + * any notifications in the transaction, send signals to other backends to + * process them. */ void AtCommit_Notify(void) @@ -1027,6 +1031,13 @@ AtCommit_Notify(void) if (amRegisteredListener && listenChannels == NIL) asyncQueueUnregister(); + /* + * Send signals to other backends. We call SignalBackends() only if there + * are pending notifies which are added to the queue by PreCommit_Notify(). + */ + if (pendingNotifies != NULL) + SignalBackends(); + /* And clean up */ ClearPendingActionsAndNotifies(); } @@ -1200,14 +1211,13 @@ Exec_UnlistenAllCommit(void) } /* - * ProcessCompletedNotifies --- send out signals and self-notifies + * ProcessCompletedNotifies --- send out self-notifies * * This is called from postgres.c just before going idle at the completion * of a transaction. If we issued any notifications in the just-completed - * transaction, send signals to other backends to process them, and also - * process the queue ourselves to send messages to our own frontend. - * Also, if we filled enough queue pages with new notifies, try to advance - * the queue tail pointer. + * transaction, process the queue ourselves to send messages to our own + * frontend. Also, if we filled enough queue pages with new notifies, try to + * advance the queue tail pointer. * * The reason that this is not done in AtCommit_Notify is that there is * a nonzero chance of errors here (for example, encoding conversion errors @@ -1216,9 +1226,8 @@ Exec_UnlistenAllCommit(void) * to ensure that a transaction's self-notifies are delivered to the frontend * before it gets the terminating ReadyForQuery message. * - * Note that we send signals and process the queue even if the transaction - * eventually aborted. This is because we need to clean out whatever got - * added to the queue. + * Note that we process the queue even if the transaction eventually aborted. + * This is because we need to clean out whatever got added to the queue. * * NOTE: we are outside of any transaction here. */ @@ -1253,9 +1262,6 @@ ProcessCompletedNotifies(void) */ StartTransactionCommand(); - /* Send signals to other backends */ - SignalBackends(); - if (listenChannels != NIL) { /* Read the queue ourselves, and send relevant stuff to the frontend */ @@ -1658,7 +1664,8 @@ asyncQueueFillWarning(void) /* * Send signals to listening backends. * - * We never signal our own process; that should be handled by our caller. + * We never signal our own process; that should be handled by + * ProcessCompletedNotifies(). * * Normally we signal only backends in our own database, since only those * backends could be interested in notifies we send. However, if there's @@ -2268,6 +2275,7 @@ ProcessIncomingNotify(void) /* We *must* reset the flag */ notifyInterruptPending = false; + /* Do nothing else if we aren't actively listening */ if (listenChannels == NIL) return;