Good day, hackers.
Zhiguo Zhow proposed to transform xlog reservation to lock-free
algorighm to increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU)
servers. [1]
While I believe lock-free reservation make sense on huge server, it is
hard to measure on small servers and personal computers/notebooks.
But increase of NUM_XLOGINSERT_LOCKS have measurable performance gain
(using synthetic test) even on my working notebook:
Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
Test scenario:
- changes to config
max_connections = 1000
shared_buffers = 1024MB
fsync = off
synchronous_commit = off
wal_sync_method = fdatasync
full_page_writes = off
wal_buffers = 1024MB
checkpoint_timeout = 1d
- table to test:
create table test1(id int8 not null);
create index test1_ix_id on test1(id);
- testing script, which inserts and deletes a lot of tuples:
\set id random(1, 1000000000000000)
begin;
insert into test1
select i
from generate_series(:id::int8, :id::int8 + 1000) as i;
delete from test1 where id >= :id::int8 and id <= :id::int8 + 1000;
end;
- way to run benchmark:
for i in 1 2 3 ; do
install/bin/pgbench -n -T 20 -j 100 -c 200 \
-M prepared -f test1.sql postgres
done;
install/bin/psql postgres -c 'truncate test1; checkpoint;'
I've tested:
- with 200 clients (-j 100 -c 200) and 400 clients (-j 200 -c 400):
- increasing NUM_XLOGINSERT_LOCKS to 64/128/256
- change in WALInsertLockAcquire with attempts on conditional locking
with 1 or 2 attempts (v0-0002-several-attempts-to-lock...).
Results are (min/avg/max tps):
18 at commit e28033fe1af8
200 clients: 420/421/427
400 clients: 428/443/444
locks 64
200 clients: 576/591/599
400 clients: 552/575/578
locks 64 + attempt=1
200 clients: 648/680/687
400 clients: 616/640/667
locks 64 + attempt=2
200 clients: 676/696/712
400 clients: 625/654/667
locks 128
200 clients: 651/665/685
400 clients: 620/651/666
locks 128 + attempt=1
200 clients: 676/678/689
400 clients: 628/652/676
locks 128 + attempt=2
200 clients: 636/675/695
400 clients: 618/658/672
locks 256
200 clients: 672/678/716
400 clients: 625/658/674
locks 256 + attempt=1
200 clients: 673/687/702
400 clients: 624/657/669
locks 256 + attempt=2
200 clients: 664/695/697
400 clients: 622/648/672
(Reminder: each transaction is insertion and deletion of 1000 tuples in
table with 1 index).
Conclusions:
- without attempt to conditional lock it worth to increase
NUM_XLOGINSERT_LOCK up to huge 256 entries.
- with 2 attempts to conditional lock it is enough (on my notebook) to
increase just to 64 entries.
- on huge number of locks (256), attempts to conditional lock slightly
degrades performance. On 128 there is no clear result, imho.
I propose increase NUM_XLOGINSERT_LOCK to 64 locks + 2 attempts to lock.
I think, it is more conservative choice.
Alternatively it should be increased at least to 128 locks.
To validate proposed change I ran pgbench with:
install/bin/pgbench -i -s 50 postgres
for i in 1 2 3 ; do
install/bin/pgbench -n -T 20 -j 100 -c 100 -M prepared postgres
done
Results:
18 e28033fe1af8
100 clients: 18648/18708/18750
400 clients: 13232/13329/13410
locks 64 + second chance2:
100 clients: 19939/20048/20168
400 clients: 13394/13394/13888
As you see, on 100 clients proposed change give ~6.5% gain in TPS.
(Note: configuration was the same, ie fsync=off, synchronous_commit=off,
etc)
After NUM_XLOGINSERT_LOCK increase will be settled in master branch, I
believe lock-free reservation should be looked at closer.
[1]
https://www.postgresql.org/message-id/flat/PH7PR11MB5796659F654F9BE983F3AD97EF142%40PH7PR11MB5796.namprd11.prod.outlook.com
-----
regards
Yura Sokolov aka funny-falcon
From 93a4d4a7e2219a952c2a544047c19db9f0f0f5c0 Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.soko...@postgrespro.ru>
Date: Thu, 16 Jan 2025 15:06:59 +0300
Subject: [PATCH v0 1/2] Increase NUM_XLOGINSERT_LOCKS to 64
---
src/backend/access/transam/xlog.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901d..39381693db6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -147,7 +147,7 @@ int wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
* to happen concurrently, but adds some CPU overhead to flushing the WAL,
* which needs to iterate all the locks.
*/
-#define NUM_XLOGINSERT_LOCKS 8
+#define NUM_XLOGINSERT_LOCKS 64
/*
* Max distance from last checkpoint, before triggering a new xlog-based
@@ -1448,7 +1448,11 @@ WALInsertLockRelease(void)
{
int i;
- for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
+ /*
+ * LWLockRelease hopes we will release in reverse order for faster
+ * search in held_lwlocks.
+ */
+ for (i = NUM_XLOGINSERT_LOCKS - 1; i >= 0; i--)
LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
&WALInsertLocks[i].l.insertingAt,
0);
--
2.43.0
From 382e0d7bcc4a5c462a4d67264f4adf91f6e4f573 Mon Sep 17 00:00:00 2001
From: Yura Sokolov <y.soko...@postgrespro.ru>
Date: Thu, 16 Jan 2025 15:30:57 +0300
Subject: [PATCH v0 2/2] several attempts to lock WALInsertLocks
---
src/backend/access/transam/xlog.c | 47 ++++++++++++++++++-------------
1 file changed, 28 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 39381693db6..c1a2e29fdb8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
#include "catalog/pg_database.h"
#include "common/controldata_utils.h"
#include "common/file_utils.h"
+#include "common/pg_prng.h"
#include "executor/instrument.h"
#include "miscadmin.h"
#include "pg_trace.h"
@@ -1370,8 +1371,7 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
static void
WALInsertLockAcquire(void)
{
- bool immed;
-
+ int attempts = 2;
/*
* It doesn't matter which of the WAL insertion locks we acquire, so try
* the one we used last time. If the system isn't particularly busy, it's
@@ -1383,29 +1383,38 @@ WALInsertLockAcquire(void)
* (semi-)randomly. This allows the locks to be used evenly if you have a
* lot of very short connections.
*/
- static int lockToTry = -1;
+ static uint32 lockToTry = 0;
+ static uint32 lockDelta = 0;
- if (lockToTry == -1)
- lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
- MyLockNo = lockToTry;
+ if (lockDelta == 0)
+ {
+ uint32 rng = pg_prng_uint32(&pg_global_prng_state);
+
+ lockToTry = rng % NUM_XLOGINSERT_LOCKS;
+ lockDelta = ((rng >> 16) % NUM_XLOGINSERT_LOCKS) | 1; /* must be odd */
+ }
/*
* The insertingAt value is initially set to 0, as we don't know our
* insert location yet.
*/
- immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
- if (!immed)
- {
- /*
- * If we couldn't get the lock immediately, try another lock next
- * time. On a system with more insertion locks than concurrent
- * inserters, this causes all the inserters to eventually migrate to a
- * lock that no-one else is using. On a system with more inserters
- * than locks, it still helps to distribute the inserters evenly
- * across the locks.
- */
- lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
- }
+ MyLockNo = lockToTry;
+retry:
+ if (LWLockConditionalAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE))
+ return;
+ /*
+ * If we couldn't get the lock immediately, try another lock next
+ * time. On a system with more insertion locks than concurrent
+ * inserters, this causes all the inserters to eventually migrate to a
+ * lock that no-one else is using. On a system with more inserters
+ * than locks, it still helps to distribute the inserters evenly
+ * across the locks.
+ */
+ lockToTry = (lockToTry + lockDelta) % NUM_XLOGINSERT_LOCKS;
+ MyLockNo = lockToTry;
+ if (--attempts)
+ goto retry;
+ LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
}
/*
--
2.43.0