Since it seems Andres missed my request to send answer's copy,
here it is:

On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
> 16.01.2025 18:36, Andres Freund пишет:
>> Hi,
>>
>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
>>> Good day, hackers.
>>>
>>> Zhiguo Zhow proposed to transform xlog reservation to lock-free algorighm to
>>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
>>>
>>> While I believe lock-free reservation make sense on huge server, it is hard
>>> to measure on small servers and personal computers/notebooks.
>>>
>>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance gain (using
>>> synthetic test) even on my working notebook:
>>>
>>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
>>
>> I've experimented with this in the past.
>>
>>
>> Unfortunately increasing it substantially can make the contention on the
>> spinlock *substantially* worse.
>>
>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench -n -M prepared -c$c -j$c -f <(echo "SELECT pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";) -P1 -T15
>>
>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload ran a few
>> times to ensure WAL is already allocated.
>>
>> With
>> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
>> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
>
> So, even in your test you have +50% gain from increasing
> NUM_XLOGINSERT_LOCKS.
>
> (And that is why I'm keen on smaller increase, like upto 64, not 128).

Oops, I swapped the results around when reformatting the results, sorry! It's
the opposite way.  I.e. increasing the locks hurts.

Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS.  This is a
slightly different disk (the other seems to have to go the way of the dodo),
so the results aren't expected to be exactly the same.

NUM_XLOGINSERT_LOCKS    TPS
1                       2583
2                       2524
4                       2711
8                       2788
16                      1938
32                      1834
64                      1865
128                     1543


>>
>> The main reason is that the increase in insert locks puts a lot more pressure
>> on the spinlock.
>
> That it addressed by Zhiguo Zhow and me in other thread [1]. But increasing
> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
> installations), and "lock-free reservation" should be measured against it.

I know that there's that thread, I just don't see how we can increase
NUM_XLOGINSERT_LOCKS due to the regressions it can cause.


>> Secondarily it's also that we spend more time iterating
>> through the insert locks when waiting, and that that causes a lot of cacheline
>> pingpong.
>
> Waiting is done with LWLockWaitForVar, and there is no wait if `insertingAt`
> is in future. It looks very efficient in master branch code.

But LWLockWaitForVar is called from WaitXLogInsertionsToFinish, which just
iterates over all locks.



Greetings,

Andres Freund


Reply via email to