[PATCH] Improve performance of NOTIFY over many databases (v2)

Martijn van Oosterhout Fri, 02 Aug 2019 08:40:57 -0700

Hoi hackers,

Here is a reworked version of the previous patches.


The original three patches have been collapsed into one as given the
changes discussed it didn't make sense to keep them separate. There
are now two patches (the third is just to help with testing):

Patch 1: Tracks the listening backends in a list so non-listening
backends can be quickly skipped over. This is separate because it's
orthogonal to the rest of the changes and there are other ways to do
this.

Patch 2: This is the meat of the change. It implements all the
suggestions discussed:

- The queue tail is now only updated lazily, whenever the notify queue
moves to a new page. This did require a new global to track this state
through the transaction commit, but it seems worth it.

- Only backends for the current database are signalled when a
notification is made

- Slow backends are woken up one at a time rather than all at once

- A backend is allowed to lag up to 4 SLRU pages behind before being
signalled. This is a tradeoff between how often to get woken up verses
how much work to do once woken up.

- All the relevant comments have been updated to describe the new
algorithm. Locking should also be correct now.

This means in the normal case where listening backends get a
notification occasionally, no-one will ever be considered slow. An
exclusive lock for cleanup will happen about once per SLRU page.
There's still the exclusive locks on adding notifications but that's
unavoidable.

One minor issue is that pg_notification_queue_usage() will now return
a small but non-zero number (about 3e-6) even when nothing is really
going on. This could be fixed by having it take an exclusive lock
instead and updating to the latest values but that barely seems worth
it.

Performance-wise it's even better than my original patches, with about
20-25% reduction in CPU usage in my test setup (using the test script
sent previously).

Here is the log output from my postgres, where you see the signalling in action:

------
16:42:48.673 [10188] martijn@test_131 DEBUG:  PreCommit_Notify
16:42:48.673 [10188] martijn@test_131 DEBUG:  NOTIFY QUEUE = (74,896)...(79,0)
16:42:48.673 [10188] martijn@test_131 DEBUG:  backendTryAdvanceTail -> true
16:42:48.673 [10188] martijn@test_131 DEBUG:  AtCommit_Notify
16:42:48.673 [10188] martijn@test_131 DEBUG:  ProcessCompletedNotifies
16:42:48.673 [10188] martijn@test_131 DEBUG:  backendTryAdvanceTail -> false
16:42:48.673 [10188] martijn@test_131 DEBUG:  asyncQueueAdvanceTail
16:42:48.673 [10188] martijn@test_131 DEBUG:  waking backend 137 (pid 10055)
16:42:48.673 [10055] martijn@test_067 DEBUG:  ProcessIncomingNotify
16:42:48.673 [10187] martijn@test_131 DEBUG:  ProcessIncomingNotify
16:42:48.673 [10055] martijn@test_067 DEBUG:  asyncQueueAdvanceTail
16:42:48.673 [10055] martijn@test_067 DEBUG:  waking backend 138 (pid 10056)
16:42:48.673 [10187] martijn@test_131 DEBUG:  ProcessIncomingNotify: done
16:42:48.673 [10055] martijn@test_067 DEBUG:  ProcessIncomingNotify: done
16:42:48.673 [10056] martijn@test_067 DEBUG:  ProcessIncomingNotify
16:42:48.673 [10056] martijn@test_067 DEBUG:  asyncQueueAdvanceTail
16:42:48.673 [10056] martijn@test_067 DEBUG:  ProcessIncomingNotify: done
16:42:48.683 [9991] martijn@test_042 DEBUG:  Async_Notify(changes)
16:42:48.683 [9991] martijn@test_042 DEBUG:  PreCommit_Notify
16:42:48.683 [9991] martijn@test_042 DEBUG:  NOTIFY QUEUE = (75,7744)...(79,32)
16:42:48.683 [9991] martijn@test_042 DEBUG:  AtCommit_Notify
-----

Have a nice weekend.
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

From 82366f1dbc0fc234fdd10dbc15519b3cf7104684 Mon Sep 17 00:00:00 2001
From: Martijn van Oosterhout <oosterh...@fox-it.com>
Date: Tue, 23 Jul 2019 16:49:30 +0200
Subject: [PATCH 1/3] Maintain queue of listening backends to speed up loops

Especially the loop to advance the tail pointer is called fairly often while
holding an exclusive lock.
---
 src/backend/commands/async.c | 45 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 6e9c580ec6..ba0b1baecc 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -217,6 +217,7 @@ typedef struct QueueBackendStatus
 {
 	int32		pid;			/* either a PID or InvalidPid */
 	Oid			dboid;			/* backend's database OID, or InvalidOid */
+	int			nextListener;	/* backendid of next listener, 0=last */
 	QueuePosition pos;			/* backend has read queue up to here */
 } QueueBackendStatus;
 
@@ -247,6 +248,7 @@ typedef struct AsyncQueueControl
 	QueuePosition tail;			/* the global tail is equivalent to the pos of
 								 * the "slowest" backend */
 	TimestampTz lastQueueFillWarn;	/* time of last queue-full msg */
+	int			firstListener;	/* backendId of first listener, 0=none */
 	QueueBackendStatus backend[FLEXIBLE_ARRAY_MEMBER];
 	/* backend[0] is not used; used entries are from [1] to [MaxBackends] */
 } AsyncQueueControl;
@@ -257,8 +259,11 @@ static AsyncQueueControl *asyncQueueControl;
 #define QUEUE_TAIL					(asyncQueueControl->tail)
 #define QUEUE_BACKEND_PID(i)		(asyncQueueControl->backend[i].pid)
 #define QUEUE_BACKEND_DBOID(i)		(asyncQueueControl->backend[i].dboid)
+#define QUEUE_LISTENER_NEXT(i)		(asyncQueueControl->backend[i].nextListener)
 #define QUEUE_BACKEND_POS(i)		(asyncQueueControl->backend[i].pos)
 
+#define QUEUE_FIRST_LISTENER		(asyncQueueControl->firstListener)
+
 /*
  * The SLRU buffer area through which we access the notification queue
  */
@@ -465,11 +470,13 @@ AsyncShmemInit(void)
 		SET_QUEUE_POS(QUEUE_HEAD, 0, 0);
 		SET_QUEUE_POS(QUEUE_TAIL, 0, 0);
 		asyncQueueControl->lastQueueFillWarn = 0;
+		QUEUE_FIRST_LISTENER = 0;
 		/* zero'th entry won't be used, but let's initialize it anyway */
 		for (i = 0; i <= MaxBackends; i++)
 		{
 			QUEUE_BACKEND_PID(i) = InvalidPid;
 			QUEUE_BACKEND_DBOID(i) = InvalidOid;
+			QUEUE_LISTENER_NEXT(i) = InvalidBackendId;
 			SET_QUEUE_POS(QUEUE_BACKEND_POS(i), 0, 0);
 		}
 	}
@@ -908,6 +915,7 @@ Exec_ListenPreCommit(void)
 	QueuePosition head;
 	QueuePosition max;
 	int			i;
+	int			prevListener;
 
 	/*
 	 * Nothing to do if we are already listening to something, nor if we
@@ -953,10 +961,14 @@ Exec_ListenPreCommit(void)
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
 	head = QUEUE_HEAD;
 	max = QUEUE_TAIL;
+	prevListener = 0;
 	if (QUEUE_POS_PAGE(max) != QUEUE_POS_PAGE(head))
 	{
-		for (i = 1; i <= MaxBackends; i++)
+		for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 		{
+			/* Find last listening backend before this one */
+			if (i < MyBackendId)
+				prevListener = i;
 			if (QUEUE_BACKEND_DBOID(i) == MyDatabaseId)
 				max = QUEUE_POS_MAX(max, QUEUE_BACKEND_POS(i));
 		}
@@ -964,6 +976,15 @@ Exec_ListenPreCommit(void)
 	QUEUE_BACKEND_POS(MyBackendId) = max;
 	QUEUE_BACKEND_PID(MyBackendId) = MyProcPid;
 	QUEUE_BACKEND_DBOID(MyBackendId) = MyDatabaseId;
+
+	/* Insert backend into list of listeners */
+	if (prevListener == 0) {
+		QUEUE_LISTENER_NEXT(MyBackendId) = QUEUE_FIRST_LISTENER;
+		QUEUE_FIRST_LISTENER = MyBackendId;
+	} else {
+		QUEUE_LISTENER_NEXT(MyBackendId) = QUEUE_LISTENER_NEXT(prevListener);
+		QUEUE_LISTENER_NEXT(prevListener) = MyBackendId;
+	}
 	LWLockRelease(AsyncQueueLock);
 
 	/* Now we are listed in the global array, so remember we're listening */
@@ -1170,19 +1191,33 @@ static void
 asyncQueueUnregister(void)
 {
 	bool		advanceTail;
+	int		i;
 
 	Assert(listenChannels == NIL);	/* else caller error */
 
 	if (!amRegisteredListener)	/* nothing to do */
 		return;
 
-	LWLockAcquire(AsyncQueueLock, LW_SHARED);
+	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
 	/* check if entry is valid and oldest ... */
 	advanceTail = (MyProcPid == QUEUE_BACKEND_PID(MyBackendId)) &&
 		QUEUE_POS_EQUAL(QUEUE_BACKEND_POS(MyBackendId), QUEUE_TAIL);
 	/* ... then mark it invalid */
 	QUEUE_BACKEND_PID(MyBackendId) = InvalidPid;
 	QUEUE_BACKEND_DBOID(MyBackendId) = InvalidOid;
+	/* and remove it from the list */
+	if (QUEUE_FIRST_LISTENER == MyBackendId)
+		QUEUE_FIRST_LISTENER = QUEUE_LISTENER_NEXT(MyBackendId);
+	else {
+		for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
+		{
+			if (QUEUE_LISTENER_NEXT(i) == MyBackendId) {
+				QUEUE_LISTENER_NEXT(i) = QUEUE_LISTENER_NEXT(QUEUE_LISTENER_NEXT(i));
+				break;
+			}
+		}
+	}
+	QUEUE_LISTENER_NEXT(MyBackendId) = InvalidBackendId;
 	LWLockRelease(AsyncQueueLock);
 
 	/* mark ourselves as no longer listed in the global array */
@@ -1459,7 +1494,7 @@ asyncQueueFillWarning(void)
 		int32		minPid = InvalidPid;
 		int			i;
 
-		for (i = 1; i <= MaxBackends; i++)
+		for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 		{
 			if (QUEUE_BACKEND_PID(i) != InvalidPid)
 			{
@@ -1519,7 +1554,7 @@ SignalBackends(void)
 	count = 0;
 
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
-	for (i = 1; i <= MaxBackends; i++)
+	for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 	{
 		pid = QUEUE_BACKEND_PID(i);
 		if (pid != InvalidPid && pid != MyProcPid)
@@ -1995,7 +2030,7 @@ asyncQueueAdvanceTail(void)
 
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
 	min = QUEUE_HEAD;
-	for (i = 1; i <= MaxBackends; i++)
+	for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 	{
 		if (QUEUE_BACKEND_PID(i) != InvalidPid)
 			min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
-- 
2.11.0

From a6d598dd26a0bd6ea4211a63571d0c78dbf8aea0 Mon Sep 17 00:00:00 2001
From: Martijn van Oosterhout <oosterh...@fox-it.com>
Date: Mon, 3 Jun 2019 17:13:31 +0200
Subject: [PATCH 2/3] Improve performance of async notifications

Advancing the tail pointer requires an exclusive lock which can block
backends from other databases, so it's worth keeping these attempts to a
minimum.

Instead of tracking the slowest backend exactly we update the queue more
lazily, only checking when we switch to a new SLRU page.  Additionally,
instead of waking up every slow backend at once, we do them one at a time.
---
 src/backend/commands/async.c | 145 ++++++++++++++++++++++++++++++-------------
 1 file changed, 103 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index ba0b1baecc..20bed5ca71 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -73,10 +73,11 @@
  *	  Finally, after we are out of the transaction altogether, we check if
  *	  we need to signal listening backends.  In SignalBackends() we scan the
  *	  list of listening backends and send a PROCSIG_NOTIFY_INTERRUPT signal
- *	  to every listening backend (we don't know which backend is listening on
- *	  which channel so we must signal them all). We can exclude backends that
- *	  are already up to date, though.  We don't bother with a self-signal
- *	  either, but just process the queue directly.
+ *	  to every listening backend for the relavent database (we don't know
+ *	  which backend is listening on which channel so we must signal them
+ *	  all).  We can exclude backends that are already up to date, though.
+ *	  We don't bother with a self-signal either, but just process the queue
+ *	  directly.
  *
  * 5. Upon receipt of a PROCSIG_NOTIFY_INTERRUPT signal, the signal handler
  *	  sets the process's latch, which triggers the event to be processed
@@ -89,13 +90,25 @@
  *	  Inbound-notify processing consists of reading all of the notifications
  *	  that have arrived since scanning last time. We read every notification
  *	  until we reach either a notification from an uncommitted transaction or
- *	  the head pointer's position. Then we check if we were the laziest
- *	  backend: if our pointer is set to the same position as the global tail
- *	  pointer is set, then we move the global tail pointer ahead to where the
- *	  second-laziest backend is (in general, we take the MIN of the current
- *	  head position and all active backends' new tail pointers). Whenever we
- *	  move the global tail pointer we also truncate now-unused pages (i.e.,
- *	  delete files in pg_notify/ that are no longer used).
+ *	  the head pointer's position.
+ *
+ * 6. To avoid SLRU wraparound and minimize disk space the tail pointer
+ *	  needs to be advanced so that old pages can be truncated.  This
+ *	  however requires an exclusive lock and as such should be done
+ *	  infrequently.
+ *
+ *	  When a new notification is added, the writer checks to see if the
+ *	  tail pointer is more than QUEUE_CLEANUP_DELAY pages behind.  If
+ *	  so, it attempts to advance the tail, and if there are slow
+ *	  backends (perhaps because all the notifications were for other
+ *	  databases), wake one of them up by sending a signal.
+ *
+ *	  When the slow backend processes the queue it notes it was behind
+ *	  and so also tries to advance the tail, possibly waking up another
+ *	  slow backend.  Eventually all backends will have processed the
+ *	  queue and the global tail pointer is move to a new page and we
+ *	  also truncate now-unused pages (i.e., delete files in pg_notify/
+ *	  that are no longer used).
  *
  * An application that listens on the same channel it notifies will get
  * NOTIFY messages for its own NOTIFYs.  These can be ignored, if not useful,
@@ -210,6 +223,12 @@ typedef struct QueuePosition
 	 (x).page != (y).page ? (x) : \
 	 (x).offset > (y).offset ? (x) : (y))
 
+/* how many pages does a backend need to be behind before it needs to be signalled */
+#define QUEUE_CLEANUP_DELAY 4
+
+/* is a backend so far behind it needs to be signalled? */
+#define QUEUE_SLOW_BACKEND(i) \
+	(asyncQueuePageDiff(QUEUE_POS_PAGE(QUEUE_HEAD), QUEUE_POS_PAGE(QUEUE_BACKEND_POS(i))) > QUEUE_CLEANUP_DELAY)
 /*
  * Struct describing a listening backend's status
  */
@@ -245,7 +264,7 @@ typedef struct QueueBackendStatus
 typedef struct AsyncQueueControl
 {
 	QueuePosition head;			/* head points to the next free location */
-	QueuePosition tail;			/* the global tail is equivalent to the pos of
+	QueuePosition tail;			/* the global tail is some place older than the
 								 * the "slowest" backend */
 	TimestampTz lastQueueFillWarn;	/* time of last queue-full msg */
 	int			firstListener;	/* backendId of first listener, 0=none */
@@ -370,10 +389,15 @@ static bool amRegisteredListener = false;
 /* has this backend sent notifications in the current transaction? */
 static bool backendHasSentNotifications = false;
 
+/* has this backend switched to new page, and so should attempt to advance
+ * the queue tail?  */
+static bool backendTryAdvanceTail = false;
+
 /* GUC parameter */
 bool		Trace_notify = false;
 
 /* local function prototypes */
+static int asyncQueuePageDiff(int p, int q);
 static bool asyncQueuePagePrecedes(int p, int q);
 static void queue_listen(ListenActionKind action, const char *channel);
 static void Async_UnlistenOnExit(int code, Datum arg);
@@ -389,7 +413,7 @@ static void asyncQueueNotificationToEntry(Notification *n, AsyncQueueEntry *qe);
 static ListCell *asyncQueueAddEntries(ListCell *nextNotify);
 static double asyncQueueUsage(void);
 static void asyncQueueFillWarning(void);
-static bool SignalBackends(void);
+static bool SignalMyDBBackends(void);
 static void asyncQueueReadAllNotifications(void);
 static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
 										 QueuePosition stop,
@@ -403,8 +427,8 @@ static void ClearPendingActionsAndNotifies(void);
 /*
  * We will work on the page range of 0..QUEUE_MAX_PAGE.
  */
-static bool
-asyncQueuePagePrecedes(int p, int q)
+static int
+asyncQueuePageDiff(int p, int q)
 {
 	int			diff;
 
@@ -420,7 +444,13 @@ asyncQueuePagePrecedes(int p, int q)
 		diff -= QUEUE_MAX_PAGE + 1;
 	else if (diff < -((QUEUE_MAX_PAGE + 1) / 2))
 		diff += QUEUE_MAX_PAGE + 1;
-	return diff < 0;
+	return diff;
+}
+
+static bool
+asyncQueuePagePrecedes(int p, int q)
+{
+	return asyncQueuePageDiff(p, q) < 0;
 }
 
 /*
@@ -850,6 +880,12 @@ PreCommit_Notify(void)
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 						 errmsg("too many notifications in the NOTIFY queue")));
 			nextNotify = asyncQueueAddEntries(nextNotify);
+
+			/* If we are advancing to a new page, remember this so after the
+			 * transaction commits we can attempt to advance the tail
+			 * pointer, see ProcessCompletedNotifies() */
+			if (QUEUE_POS_OFFSET(QUEUE_HEAD) == 0)
+				backendTryAdvanceTail = true;
 			LWLockRelease(AsyncQueueLock);
 		}
 	}
@@ -999,8 +1035,6 @@ Exec_ListenPreCommit(void)
 	 * notification to the frontend.  Also, although our transaction might
 	 * have executed NOTIFY, those message(s) aren't queued yet so we can't
 	 * see them in the queue.
-	 *
-	 * This will also advance the global tail pointer if possible.
 	 */
 	if (!QUEUE_POS_EQUAL(max, head))
 		asyncQueueReadAllNotifications();
@@ -1133,7 +1167,7 @@ ProcessCompletedNotifies(void)
 	StartTransactionCommand();
 
 	/* Send signals to other backends */
-	signalled = SignalBackends();
+	signalled = SignalMyDBBackends();
 
 	if (listenChannels != NIL)
 	{
@@ -1151,6 +1185,16 @@ ProcessCompletedNotifies(void)
 		 * harmless.)
 		 */
 		asyncQueueAdvanceTail();
+		backendTryAdvanceTail = false;
+	}
+
+	if (backendTryAdvanceTail)
+	{
+		/* We switched to a new page while writing our notifies to the
+		 * queue, so we try to advance the tail ourselves, possibly waking
+		 * up another backend if it is running behind */
+		backendTryAdvanceTail = false;
+		asyncQueueAdvanceTail();
 	}
 
 	CommitTransactionCommand();
@@ -1190,7 +1234,6 @@ IsListeningOn(const char *channel)
 static void
 asyncQueueUnregister(void)
 {
-	bool		advanceTail;
 	int		i;
 
 	Assert(listenChannels == NIL);	/* else caller error */
@@ -1199,10 +1242,7 @@ asyncQueueUnregister(void)
 		return;
 
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
-	/* check if entry is valid and oldest ... */
-	advanceTail = (MyProcPid == QUEUE_BACKEND_PID(MyBackendId)) &&
-		QUEUE_POS_EQUAL(QUEUE_BACKEND_POS(MyBackendId), QUEUE_TAIL);
-	/* ... then mark it invalid */
+	/* Mark our entry as invalid */
 	QUEUE_BACKEND_PID(MyBackendId) = InvalidPid;
 	QUEUE_BACKEND_DBOID(MyBackendId) = InvalidOid;
 	/* and remove it from the list */
@@ -1222,10 +1262,6 @@ asyncQueueUnregister(void)
 
 	/* mark ourselves as no longer listed in the global array */
 	amRegisteredListener = false;
-
-	/* If we were the laziest backend, try to advance the tail pointer */
-	if (advanceTail)
-		asyncQueueAdvanceTail();
 }
 
 /*
@@ -1518,7 +1554,7 @@ asyncQueueFillWarning(void)
 }
 
 /*
- * Send signals to all listening backends (except our own).
+ * Send signals to all listening backends (except our own) for our database.
  *
  * Returns true if we sent at least one signal.
  *
@@ -1531,7 +1567,7 @@ asyncQueueFillWarning(void)
  * Since we know the BackendId and the Pid the signalling is quite cheap.
  */
 static bool
-SignalBackends(void)
+SignalMyDBBackends(void)
 {
 	bool		signalled = false;
 	int32	   *pids;
@@ -1541,9 +1577,9 @@ SignalBackends(void)
 	int32		pid;
 
 	/*
-	 * Identify all backends that are listening and not already up-to-date. We
-	 * don't want to send signals while holding the AsyncQueueLock, so we just
-	 * build a list of target PIDs.
+	 * Identify all backends with MyDatabaseId that are listening and not
+	 * already up-to-date.  We don't want to send signals while holding the
+	 * AsyncQueueLock, so we just build a list of target PIDs.
 	 *
 	 * XXX in principle these pallocs could fail, which would be bad. Maybe
 	 * preallocate the arrays?	But in practice this is only run in trivial
@@ -1557,7 +1593,7 @@ SignalBackends(void)
 	for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 	{
 		pid = QUEUE_BACKEND_PID(i);
-		if (pid != InvalidPid && pid != MyProcPid)
+		if (pid != InvalidPid && pid != MyProcPid && QUEUE_BACKEND_DBOID(i) == MyDatabaseId)
 		{
 			QueuePosition pos = QUEUE_BACKEND_POS(i);
 
@@ -1782,6 +1818,9 @@ asyncQueueReadAllNotifications(void)
 	Assert(MyProcPid == QUEUE_BACKEND_PID(MyBackendId));
 	pos = oldpos = QUEUE_BACKEND_POS(MyBackendId);
 	head = QUEUE_HEAD;
+	/* If we're behind, we possibly got signalled to catchup.  Remember
+	 * this so we attempt to advance the tail later */
+	advanceTail = QUEUE_SLOW_BACKEND(MyBackendId);
 	LWLockRelease(AsyncQueueLock);
 
 	if (QUEUE_POS_EQUAL(pos, head))
@@ -1889,12 +1928,9 @@ asyncQueueReadAllNotifications(void)
 		/* Update shared state */
 		LWLockAcquire(AsyncQueueLock, LW_SHARED);
 		QUEUE_BACKEND_POS(MyBackendId) = pos;
-		advanceTail = QUEUE_POS_EQUAL(oldpos, QUEUE_TAIL);
 		LWLockRelease(AsyncQueueLock);
 
-		/* If we were the laziest backend, try to advance the tail pointer */
-		if (advanceTail)
-			asyncQueueAdvanceTail();
+		/* We don't try to advance the tail here. */
 
 		PG_RE_THROW();
 	}
@@ -1903,10 +1939,10 @@ asyncQueueReadAllNotifications(void)
 	/* Update shared state */
 	LWLockAcquire(AsyncQueueLock, LW_SHARED);
 	QUEUE_BACKEND_POS(MyBackendId) = pos;
-	advanceTail = QUEUE_POS_EQUAL(oldpos, QUEUE_TAIL);
 	LWLockRelease(AsyncQueueLock);
 
-	/* If we were the laziest backend, try to advance the tail pointer */
+	/* We were behind, so try to advance the tail pointer, possibly
+	 * signalling another backend if necessary */
 	if (advanceTail)
 		asyncQueueAdvanceTail();
 
@@ -2016,8 +2052,8 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 }
 
 /*
- * Advance the shared queue tail variable to the minimum of all the
- * per-backend tail pointers.  Truncate pg_notify space if possible.
+ * Advance the shared queue tail variable if possible.  If a slow backend is
+ * holding everything up, signal it.  Truncate pg_notify space if possible.
  */
 static void
 asyncQueueAdvanceTail(void)
@@ -2027,18 +2063,43 @@ asyncQueueAdvanceTail(void)
 	int			oldtailpage;
 	int			newtailpage;
 	int			boundary;
+	int			slowbackendid = InvalidBackendId;
+	int			slowbackendpid;
 
+	/* Advance the tail as far as possible, noting if there is a slow
+	 * backend we could kick */
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
 	min = QUEUE_HEAD;
 	for (i = QUEUE_FIRST_LISTENER; i; i = QUEUE_LISTENER_NEXT(i))
 	{
 		if (QUEUE_BACKEND_PID(i) != InvalidPid)
+		{
+			if (QUEUE_BACKEND_PID(i) != MyProcPid && QUEUE_SLOW_BACKEND(i))
+			{
+				slowbackendid = i;
+				slowbackendpid = QUEUE_BACKEND_PID(i);
+			}
 			min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
+		}
 	}
 	oldtailpage = QUEUE_POS_PAGE(QUEUE_TAIL);
 	QUEUE_TAIL = min;
 	LWLockRelease(AsyncQueueLock);
 
+	/* At least one backend was slow, so signal a random one to wake it
+	 * up.  It should in turn call this function to signal the next,
+	 * see asyncQueueReadAllNotifications() */
+	if (slowbackendid != InvalidBackendId) {
+
+		/* Note: assuming things aren't broken, a signal failure here could
+		 * only occur if the target backend exited since we released
+		 * AsyncQueueLock; which is unlikely but certainly possible. So we
+		 * just log a low-level debug message if it happens.
+		 */
+		if (SendProcSignal(slowbackendpid, PROCSIG_NOTIFY_INTERRUPT, slowbackendid) < 0)
+			elog(DEBUG3, "could not signal backend with PID %d: %m", slowbackendpid);
+	}
+
 	/*
 	 * We can truncate something if the global tail advanced across an SLRU
 	 * segment boundary.
-- 
2.11.0

From a499a683c6f4a93bdc58ea05cecb531d2b917cf7 Mon Sep 17 00:00:00 2001
From: Martijn van Oosterhout <oosterh...@fox-it.com>
Date: Fri, 2 Aug 2019 14:22:46 +0200
Subject: [PATCH 3/3] Debugging for NOTIFY

---
 src/backend/commands/async.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 20bed5ca71..597e568c85 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -880,12 +880,20 @@ PreCommit_Notify(void)
 						(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 						 errmsg("too many notifications in the NOTIFY queue")));
 			nextNotify = asyncQueueAddEntries(nextNotify);
+			if (Trace_notify)
+				elog(DEBUG1, "NOTIFY QUEUE = (%d,%d)...(%d,%d)",
+					QUEUE_POS_PAGE(QUEUE_TAIL), QUEUE_POS_OFFSET(QUEUE_TAIL),
+					QUEUE_POS_PAGE(QUEUE_HEAD), QUEUE_POS_OFFSET(QUEUE_HEAD));
 
 			/* If we are advancing to a new page, remember this so after the
 			 * transaction commits we can attempt to advance the tail
 			 * pointer, see ProcessCompletedNotifies() */
 			if (QUEUE_POS_OFFSET(QUEUE_HEAD) == 0)
+			{
+				if (Trace_notify)
+					elog(DEBUG1, "backendTryAdvanceTail -> true");
 				backendTryAdvanceTail = true;
+			}
 			LWLockRelease(AsyncQueueLock);
 		}
 	}
@@ -1194,6 +1202,8 @@ ProcessCompletedNotifies(void)
 		 * queue, so we try to advance the tail ourselves, possibly waking
 		 * up another backend if it is running behind */
 		backendTryAdvanceTail = false;
+		if (Trace_notify)
+			elog(DEBUG1, "backendTryAdvanceTail -> false");
 		asyncQueueAdvanceTail();
 	}
 
@@ -2066,6 +2076,8 @@ asyncQueueAdvanceTail(void)
 	int			slowbackendid = InvalidBackendId;
 	int			slowbackendpid;
 
+	if (Trace_notify)
+		elog(DEBUG1, "asyncQueueAdvanceTail");
 	/* Advance the tail as far as possible, noting if there is a slow
 	 * backend we could kick */
 	LWLockAcquire(AsyncQueueLock, LW_EXCLUSIVE);
@@ -2096,6 +2108,8 @@ asyncQueueAdvanceTail(void)
 		 * AsyncQueueLock; which is unlikely but certainly possible. So we
 		 * just log a low-level debug message if it happens.
 		 */
+		if (Trace_notify)
+			elog(DEBUG1, "waking backend %d (pid %d)", slowbackendid, slowbackendpid);
 		if (SendProcSignal(slowbackendpid, PROCSIG_NOTIFY_INTERRUPT, slowbackendid) < 0)
 			elog(DEBUG3, "could not signal backend with PID %d: %m", slowbackendpid);
 	}
-- 
2.11.0

[PATCH] Improve performance of NOTIFY over many databases (v2)

Reply via email to