Good day, hackers.

Zhiguo Zhow proposed to transform xlog reservation to lock-free algorighm to increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]

While I believe lock-free reservation make sense on huge server, it is hard to measure on small servers and personal computers/notebooks.

But increase of NUM_XLOGINSERT_LOCKS have measurable performance gain (using synthetic test) even on my working notebook:

  Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04

Test scenario:

- changes to config

  max_connections = 1000
  shared_buffers = 1024MB
  fsync = off
  synchronous_commit = off
  wal_sync_method = fdatasync
  full_page_writes = off
  wal_buffers = 1024MB
  checkpoint_timeout = 1d

- table to test:

  create table test1(id int8 not null);
  create index test1_ix_id on test1(id);

- testing script, which inserts and deletes a lot of tuples:

  \set id random(1, 1000000000000000)

  begin;
  insert into test1
      select i
      from generate_series(:id::int8, :id::int8 + 1000) as i;
  delete from test1 where id >= :id::int8 and id <= :id::int8 + 1000;
  end;

- way to run benchmark:

  for i in 1 2 3 ; do
    install/bin/pgbench -n -T 20 -j 100 -c 200 \
        -M prepared -f test1.sql postgres
  done;
  install/bin/psql postgres -c 'truncate test1; checkpoint;'

I've tested:
- with 200 clients (-j 100 -c 200) and 400 clients (-j 200 -c 400):
- increasing NUM_XLOGINSERT_LOCKS to 64/128/256
- change in WALInsertLockAcquire with attempts on conditional locking with 1 or 2 attempts (v0-0002-several-attempts-to-lock...).

Results are (min/avg/max tps):

18 at commit e28033fe1af8
  200 clients: 420/421/427
  400 clients: 428/443/444
locks 64
  200 clients: 576/591/599
  400 clients: 552/575/578
locks 64 + attempt=1
  200 clients: 648/680/687
  400 clients: 616/640/667
locks 64 + attempt=2
  200 clients: 676/696/712
  400 clients: 625/654/667
locks 128
  200 clients: 651/665/685
  400 clients: 620/651/666
locks 128 + attempt=1
  200 clients: 676/678/689
  400 clients: 628/652/676
locks 128 + attempt=2
  200 clients: 636/675/695
  400 clients: 618/658/672
locks 256
  200 clients: 672/678/716
  400 clients: 625/658/674
locks 256 + attempt=1
  200 clients: 673/687/702
  400 clients: 624/657/669
locks 256 + attempt=2
  200 clients: 664/695/697
  400 clients: 622/648/672

(Reminder: each transaction is insertion and deletion of 1000 tuples in table with 1 index).

Conclusions:
- without attempt to conditional lock it worth to increase NUM_XLOGINSERT_LOCK up to huge 256 entries. - with 2 attempts to conditional lock it is enough (on my notebook) to increase just to 64 entries. - on huge number of locks (256), attempts to conditional lock slightly degrades performance. On 128 there is no clear result, imho.

I propose increase NUM_XLOGINSERT_LOCK to 64 locks + 2 attempts to lock.
I think, it is more conservative choice.
Alternatively it should be increased at least to 128 locks.

To validate proposed change I ran pgbench with:

  install/bin/pgbench -i -s 50 postgres
  for i in 1 2 3 ; do
    install/bin/pgbench -n -T 20 -j 100 -c 100 -M prepared postgres
  done

Results:

18 e28033fe1af8
  100 clients: 18648/18708/18750
  400 clients: 13232/13329/13410
locks 64 + second chance2:
  100 clients: 19939/20048/20168
  400 clients: 13394/13394/13888

As you see, on 100 clients proposed change give ~6.5% gain in TPS.

(Note: configuration was the same, ie fsync=off, synchronous_commit=off, etc)

After NUM_XLOGINSERT_LOCK increase will be settled in master branch, I believe lock-free reservation should be looked at closer.

[1] https://www.postgresql.org/message-id/flat/PH7PR11MB5796659F654F9BE983F3AD97EF142%40PH7PR11MB5796.namprd11.prod.outlook.com


-----

regards
Yura Sokolov aka funny-falcon
From 93a4d4a7e2219a952c2a544047c19db9f0f0f5c0 Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.soko...@postgrespro.ru>
Date: Thu, 16 Jan 2025 15:06:59 +0300
Subject: [PATCH v0 1/2] Increase NUM_XLOGINSERT_LOCKS to 64

---
 src/backend/access/transam/xlog.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901d..39381693db6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -147,7 +147,7 @@ int			wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
  * which needs to iterate all the locks.
  */
-#define NUM_XLOGINSERT_LOCKS  8
+#define NUM_XLOGINSERT_LOCKS  64
 
 /*
  * Max distance from last checkpoint, before triggering a new xlog-based
@@ -1448,7 +1448,11 @@ WALInsertLockRelease(void)
 	{
 		int			i;
 
-		for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
+		/*
+		 * LWLockRelease hopes we will release in reverse order for faster
+		 * search in held_lwlocks.
+		 */
+		for (i = NUM_XLOGINSERT_LOCKS - 1; i >= 0; i--)
 			LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
 								  &WALInsertLocks[i].l.insertingAt,
 								  0);
-- 
2.43.0

From 382e0d7bcc4a5c462a4d67264f4adf91f6e4f573 Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.soko...@postgrespro.ru>
Date: Thu, 16 Jan 2025 15:30:57 +0300
Subject: [PATCH v0 2/2] several attempts to lock WALInsertLocks

---
 src/backend/access/transam/xlog.c | 47 ++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 39381693db6..c1a2e29fdb8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
 #include "common/file_utils.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -1370,8 +1371,7 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 static void
 WALInsertLockAcquire(void)
 {
-	bool		immed;
-
+	int attempts = 2;
 	/*
 	 * It doesn't matter which of the WAL insertion locks we acquire, so try
 	 * the one we used last time.  If the system isn't particularly busy, it's
@@ -1383,29 +1383,38 @@ WALInsertLockAcquire(void)
 	 * (semi-)randomly.  This allows the locks to be used evenly if you have a
 	 * lot of very short connections.
 	 */
-	static int	lockToTry = -1;
+	static uint32 lockToTry = 0;
+	static uint32 lockDelta = 0;
 
-	if (lockToTry == -1)
-		lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
-	MyLockNo = lockToTry;
+	if (lockDelta == 0)
+	{
+		uint32 rng = pg_prng_uint32(&pg_global_prng_state);
+
+		lockToTry = rng % NUM_XLOGINSERT_LOCKS;
+		lockDelta = ((rng >> 16) % NUM_XLOGINSERT_LOCKS) | 1; /* must be odd */
+	}
 
 	/*
 	 * The insertingAt value is initially set to 0, as we don't know our
 	 * insert location yet.
 	 */
-	immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
-	if (!immed)
-	{
-		/*
-		 * If we couldn't get the lock immediately, try another lock next
-		 * time.  On a system with more insertion locks than concurrent
-		 * inserters, this causes all the inserters to eventually migrate to a
-		 * lock that no-one else is using.  On a system with more inserters
-		 * than locks, it still helps to distribute the inserters evenly
-		 * across the locks.
-		 */
-		lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
-	}
+	MyLockNo = lockToTry;
+retry:
+	if (LWLockConditionalAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE))
+		return;
+	/*
+	 * If we couldn't get the lock immediately, try another lock next
+	 * time.  On a system with more insertion locks than concurrent
+	 * inserters, this causes all the inserters to eventually migrate to a
+	 * lock that no-one else is using.  On a system with more inserters
+	 * than locks, it still helps to distribute the inserters evenly
+	 * across the locks.
+	 */
+	lockToTry = (lockToTry + lockDelta) % NUM_XLOGINSERT_LOCKS;
+	MyLockNo = lockToTry;
+	if (--attempts)
+		goto retry;
+	LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 }
 
 /*
-- 
2.43.0

Reply via email to