Re: Problem with synchronous replication

Michael Paquier Wed, 30 Oct 2019 19:12:39 -0700

On Wed, Oct 30, 2019 at 05:43:04PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 30 Oct 2019 17:21:17 +0900, Fujii Masao <masao.fu...@gmail.com> wrote 
> in 
>> This change causes every ending backends to always take the exclusive lock
>> even when it's not in SyncRep queue. This may be problematic, for example,
>> when terminating multiple backends at the same time? If yes,
>> it might be better to check SHMQueueIsDetached() again after taking the lock.
>> That is,
> 
> I'm not sure how much that harms but double-checked locking
> (releasing) is simple enough for reducing possible congestion here, I
> think.


FWIW, I could not measure any actual difference with pgbench -C, up to
500 sessions and an empty input file (just have one meta-command) and
-c 20.

I have added some comments in SyncRepCleanupAtProcExit(), and adjusted
the patch with the suggestion from Fujii-san.  Any comments?
--
Michael

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 16aee1de4c..f72b8a75f3 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -149,6 +149,13 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 	const char *old_status;
 	int			mode;
 
+	/*
+	 * This should be called while holding interrupts during a transaction
+	 * commit to prevent the follow-up shared memory queue cleanups to be
+	 * influenced by external interruptions.
+	 */
+	Assert(InterruptHoldoffCount > 0);
+
 	/* Cap the level for anything other than commit to remote flush only. */
 	if (commit)
 		mode = SyncRepWaitMode;
@@ -361,10 +368,18 @@ SyncRepCancelWait(void)
 void
 SyncRepCleanupAtProcExit(void)
 {
+	/*
+	 * First check if we are removed from the queue without the lock to
+	 * not slow down backend exit.
+	 */
 	if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
 	{
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
-		SHMQueueDelete(&(MyProc->syncRepLinks));
+
+		/* maybe we have just been removed, so recheck */
+		if (!SHMQueueIsDetached(&(MyProc->syncRepLinks)))
+			SHMQueueDelete(&(MyProc->syncRepLinks));
+
 		LWLockRelease(SyncRepLock);
 	}
 }
@@ -508,6 +523,8 @@ SyncRepReleaseWaiters(void)
 /*
  * Calculate the synced Write, Flush and Apply positions among sync standbys.
  *
+ * The caller must hold SyncRepLock.
+ *
  * Return false if the number of sync standbys is less than
  * synchronous_standby_names specifies. Otherwise return true and
  * store the positions into *writePtr, *flushPtr and *applyPtr.
@@ -521,6 +538,8 @@ SyncRepGetSyncRecPtr(XLogRecPtr *writePtr, XLogRecPtr *flushPtr,
 {
 	List	   *sync_standbys;
 
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	*writePtr = InvalidXLogRecPtr;
 	*flushPtr = InvalidXLogRecPtr;
 	*applyPtr = InvalidXLogRecPtr;
@@ -680,6 +699,8 @@ cmp_lsn(const void *a, const void *b)
 List *
 SyncRepGetSyncStandbys(bool *am_sync)
 {
+	Assert(LWLockHeldByMe(SyncRepLock));
+
 	/* Set default result */
 	if (am_sync != NULL)
 		*am_sync = false;
@@ -984,7 +1005,7 @@ SyncRepGetStandbyPriority(void)
  * Pass all = true to wake whole queue; otherwise, just wake up to
  * the walsender's LSN.
  *
- * Must hold SyncRepLock.
+ * The caller must hold SyncRepLock in exclusive mode.
  */
 static int
 SyncRepWakeQueue(bool all, int mode)
@@ -995,6 +1016,7 @@ SyncRepWakeQueue(bool all, int mode)
 	int			numprocs = 0;
 
 	Assert(mode >= 0 && mode < NUM_SYNC_REP_WAIT_MODE);
+	Assert(LWLockHeldByMeInMode(SyncRepLock, LW_EXCLUSIVE));
 	Assert(SyncRepQueueIsOrderedByLSN(mode));
 
 	proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]),

signature.asc
Description: PGP signature

Re: Problem with synchronous replication

Reply via email to