Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Heikki Linnakangas Thu, 06 Nov 2025 04:06:25 -0800

On 06/11/2025 13:20, Arseniy Mukhin wrote:

On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <[email protected]> wrote:


On 05/11/2025 21:02, Matheus Alcantara wrote:

On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:

In case of an error on TransactionIdDidCommit() I think that we will
have the same behavior as we advance the "current" position before of
DidCommit call the PG_FINALLY block will set the backend position past
the failing notification entry.


With my patch, if TransactionIdDidCommit() fails, we will lose all the
notifications that we have buffered in the local buffer but haven't
passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
notification, the one where TransactionIdDidCommit() failed.

Yeah, that's true.

How bad would be to store the notification entries within a list and
store the next position with the notification entry and then wrap the
NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
saved "next position" on PG_CATCH? Something like this:


[ ...]

That addresses the failure on NotifyMyFrontEnd, but not on
TransactionIdDidCommit.

IMHO we should just make these errors FATAL. TransactionIdDidCommit()
should not really fail, after fixing the bug we're discussing.
NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
an otherwise idle connection. Or it could fail if the client connection
is lost, but then the backend is about to die anyway. And arguably
closing the connection is better than losing even a single notification,
anyway.

My only concern with making these errors FATAL is that if a notification
entry causes a different, recoverable error, all subsequent messages
will be lost. This is because if backend die and the user open a new
connection and execute LISTEN on the channel it will not see these
notifications past the one that caused the error. I'm not sure if we are
completely safe from this case of a recoverable error, what do you
think?


I did some more testing of the current behavior, using the encoding
conversion to cause an error:

In backend A:

SET client_encoding='latin1';
LISTEN foo;

Backend b:

NOTIFY foo, 'ハ';

Observations:

- If the connection is idle when the notification is received, the ERROR
is turned into FATAL anyway:

postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
has no equivalent in encoding "LATIN1"
FATAL:  terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

- If there are multiple notifications pending, we stop processing the
subsequent notifications on the first error. The subsequent
notifications will only be processed when another notify interrupt is
received, i.e. when a backend sends yet another notification.

I'm getting more and more convinced that escalating all ERRORs to FATALs
during notify processing is the right way to go. Attached is a new patch
set that does that.


One point about removing try/catch: when we execute LISTEN for the
very first time, we read the queue from the min(listener's pos) up to
the head. listenChannels is empty at the moment, so we never call
NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
some problematic entry in the queue that we need to process during
initial LISTEN, it could block creation of new listeners. Is it
something we need to worry about?

Hmm, I don't see how it could block the creation of new listeners.Exec_ListenPreCommit() starts from "max(listener's pos)" over existinglisteners, not min(). Am I missing something?

That said, Exec_ListenPreCommit() could start from 'min' among alllisteners, if there are no listeners for the same backend. Per attachedpatch. I don't think it makes much difference though.

Another small change we could make is to check for listenChannels == NILbefore calling TransactionIdDidCommit.


- Heikki

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bd37d5beb5..586d5609c51 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1041,6 +1041,8 @@ static void
 Exec_ListenPreCommit(void)
 {
        QueuePosition head;
+       QueuePosition startPos;
+       QueuePosition min;
        QueuePosition max;
        ProcNumber      prevListener;
 
@@ -1085,18 +1087,20 @@ Exec_ListenPreCommit(void)
         * and manipulate the list links.
         */
        LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
-       head = QUEUE_HEAD;
+       head = min = QUEUE_HEAD;
        max = QUEUE_TAIL;
        prevListener = INVALID_PROC_NUMBER;
        for (ProcNumber i = QUEUE_FIRST_LISTENER; i != INVALID_PROC_NUMBER; i = 
QUEUE_NEXT_LISTENER(i))
        {
+               min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
                if (QUEUE_BACKEND_DBOID(i) == MyDatabaseId)
                        max = QUEUE_POS_MAX(max, QUEUE_BACKEND_POS(i));
                /* Also find last listening backend before this one */
                if (i < MyProcNumber)
                        prevListener = i;
        }
-       QUEUE_BACKEND_POS(MyProcNumber) = max;
+       startPos = QUEUE_POS_MAX(min, max);
+       QUEUE_BACKEND_POS(MyProcNumber) = startPos;
        QUEUE_BACKEND_PID(MyProcNumber) = MyProcPid;
        QUEUE_BACKEND_DBOID(MyProcNumber) = MyDatabaseId;
        /* Insert backend into list of listeners at correct position */
@@ -1123,7 +1127,7 @@ Exec_ListenPreCommit(void)
         * our transaction might have executed NOTIFY, those message(s) aren't
         * queued yet so we won't skip them here.
         */
-       if (!QUEUE_POS_EQUAL(max, head))
+       if (!QUEUE_POS_EQUAL(startPos, head))
                asyncQueueReadAllNotifications();
 }

Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue

Reply via email to