Re: Get rid of WALBufMappingLock

Yura Sokolov Sat, 18 Jan 2025 16:31:37 -0800

19.01.2025 03:11, Yura Sokolov пишет:

Good day, hackers.
During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freundused benchmark which creates WAL records very intensively. While I thisit is not completely fair (1MB log records are really rare), it pushedme to analyze write-side waiting of XLog machinery.
First I tried to optimize WaitXLogInsertionsToFinish, but without greatsuccess (yet).
While profiling, I found a lot of time is spend in the memory clearingunder global WALBufMappingLock:
     MemSet((char *) NewPage, 0, XLOG_BLCKSZ);

It is obvious scalability bottleneck.

So "challenge was accepted".
Certainly, backend should initialize pages without exclusive lock. Butwhich way to ensure pages were initialized? In other words, how toensure XLogCtl->InitializedUpTo is correct.
I've tried to play around WALBufMappingLock with holding it for a shorttime and spinning on XLogCtl->xlblocks[nextidx]. But in the end I foundWALBufMappingLock is useless at all.
Instead of holding lock, it is better to allow backends to cooperate:
- I bound ConditionVariable to each xlblocks entry,
- every backend now checks every required block pointed byInitializedUpto was successfully initialized or sleeps on its condvar,- when backend sure block is initialized, it tries to updateInitializedUpTo with conditional variable.
Andres's benchmark looks like:
c=100 && install/bin/psql -c checkpoint -c 'select pg_switch_wal()'postgres && install/bin/pgbench -n -M prepared -c$c -j$c -f <(echo"SELECT pg_logical_emit_message(true, 'test', repeat('0',1024*1024));";) -P1 -T45 postgres
So, it generate 1M records as fast as possible for 45 seconds.

Test machine is Ryzen 5825U (8c/16th) limited to 2GHz.
Config:

   max_connections = 1000
   shared_buffers = 1024MB
   fsync = off
   wal_sync_method = fdatasync
   full_page_writes = off
   wal_buffers = 1024MB
   checkpoint_timeout = 1d

Results are: "average for 45 sec"  /"1 second max outlier"

Results for master @ d3d098316913 :
   25  clients: 2908  /3230
   50  clients: 2759  /3130
   100 clients: 2641  /2933
   200 clients: 2419  /2707
   400 clients: 1928  /2377
   800 clients: 1689  /2266

With v0-0001-Get-rid-of-WALBufMappingLock.patch :
   25  clients: 3103  /3583
   50  clients: 3183  /3706
   100 clients: 3106  /3559
   200 clients: 2902  /3427
   400 clients: 2303  /2717
   800 clients: 1925  /2329

Combined with v0-0002-several-attempts-to-lock-WALInsertLocks.patch

No WALBufMappingLock + attempts on XLogInsertLock:
   25  clients: 3518  /3750
   50  clients: 3355  /3548
   100 clients: 3226  /3460
   200 clients: 3092  /3299
   400 clients: 2575  /2801
   800 clients: 1946  /2341

This results are with untouched NUM_XLOGINSERT_LOCKS == 8.
[1] http://postgr.es/m/flat/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru
PS.
Increasing NUM_XLOGINSERT_LOCKS to 64 gives:
   25  clients: 3457  /3624
   50  clients: 3215  /3500
   100 clients: 2750  /3000
   200 clients: 2535  /2729
   400 clients: 2163  /2400
   800 clients: 1700  /2060

While doing this on master:
   25  clients  2645  /2953
   50  clients: 2562  /2968
   100 clients: 2364  /2756
   200 clients: 2266  /2564
   400 clients: 1868  /2228
   800 clients: 1527  /2133
So, patched version with increased NUM_XLOGINSERT_LOCKS looks no worsethan unpatched without increasing num of locks.


I'm too brave... or too sleepy (it's 3:30am)...
But I took the risk of sending a patch to commitfest:
https://commitfest.postgresql.org/52/5511/

------
regards
Yura Sokolov aka funny-falcon

Re: Get rid of WALBufMappingLock

Reply via email to