On 05/11/2025 14:40, Matheus Alcantara wrote:
On Wed Nov 5, 2025 at 6:21 AM -03, Heikki Linnakangas wrote:
On 04/11/2025 16:40, Arseniy Mukhin wrote:
If we want to preserve the previous behavior, maybe we could use a new
local position while copying notifications and only advance the
"current" position while sending notifications to the frontend?
That's not good. The loop advances 'current' before calling
TransactionIdDidCommit() to ensure that if there's a broken entry in the
queue for which TransactionIdDidCommit() throws an error, we advance
'current' past that point. Otherwise we would get back later to try to
process the same broken entry again and again.
We could put the NotifyMyFrontEnd() calls in a PG_FINALLY block, so that
the copied notifications get sent even on error. But I'm a reluctant to
put that in PG_FINALLY, it's not clear to me if NotifyMyFrontEnd() is
safe to call during error processing. And if the client is not
processing the notifications, the abort processing could be delayed for
a long time.
One idea is to close the client connection on a TransactionIdDidCommit
error, i.e. make the error FATAL. The current behavior where we skip the
notification seems problematic already. Closing the connection would be
a clear signal that some notifications have been lost.
Or we can just accept new behavior that if TransactionIdDidCommit()
fails, we might lose more notifications than the one that caused the error.
I think that we may have an error on TransactionIdDidCommit() and on
NotifyMyFrontEnd() right?
True.
In case of an error on TransactionIdDidCommit() I think that we will
have the same behavior as we advance the "current" position before of
DidCommit call the PG_FINALLY block will set the backend position past
the failing notification entry.
With my patch, if TransactionIdDidCommit() fails, we will lose all the
notifications that we have buffered in the local buffer but haven't
passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
notification, the one where TransactionIdDidCommit() failed.
I think that the problem is on NotifyMyFrontEnd() call that if it fails
we will set the current backend position to the head of the queue
or the beginning of the next page possibly losing notifications.
True, a failure on NotifyMyFrontEnd can happen too, and it will also
lead to losing all the other notifications in the local buffer.
How bad would be to store the notification entries within a list and
store the next position with the notification entry and then wrap the
NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
saved "next position" on PG_CATCH? Something like this:
do
{
ListEntry le;
qe = (AsyncQueueEntry *) (page_buffer + QUEUE_POS_OFFSET(thisentry));
reachedEndOfPage = asyncQueueAdvance(current, qe->length);
...
else if (TransactionIdDidCommit(qe->xid))
{
le->entry = qe;
le->nextPos = current;
}
} while (!reachedEndOfPage);
foreach(lc, notificationEntries)
{
ListEntry *le = lfirst(lc);
AsyncQueueEntry *qe = le->entry;
char *channel = qe->data;
PG_TRY()
{
if (IsListeningOn(channel))
{
char *payload = qe->data + strlen(channel) + 1;
NotifyMyFrontEnd(channel, payload, qe->srcPid);
}
}
PG_CATCH()
{
current = le->nextPos;
PG_RE_THROW();
}
}
That addresses the failure on NotifyMyFrontEnd, but not on
TransactionIdDidCommit.
IMHO we should just make these errors FATAL. TransactionIdDidCommit()
should not really fail, after fixing the bug we're discussing.
NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
an otherwise idle connection. Or it could fail if the client connection
is lost, but then the backend is about to die anyway. And arguably
closing the connection is better than losing even a single notification,
anyway.
- Heikki