23.01.2025 08:41, wenhui qiu wrote:
HI Japin
     Thank you for you test ,It seems NUM_XLOGINSERT_LOCKS 64 is great , I think it doesn't need to grow much,What do you think?

I agree: while 128 shows small benefit, it is not as big at the moment.
Given there's other waiting issues (may) arise from increasing it, 64
seems to be sweet spot.

Probably in a future it could be increased more after other places will be optimized.

On Thu, Jan 23, 2025 at 10:30 AM Japin Li <japi...@hotmail.com <mailto:japi...@hotmail.com>> wrote:

    On Sat, 18 Jan 2025 at 14:53, Yura Sokolov <y.soko...@postgrespro.ru
    <mailto:y.soko...@postgrespro.ru>> wrote:
     > Since it seems Andres missed my request to send answer's copy,
     > here it is:
     >
     > On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
     >> 16.01.2025 18:36, Andres Freund пишет:
     >>> Hi,
     >>>
     >>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
     >>>> Good day, hackers.
     >>>>
     >>>> Zhiguo Zhow proposed to transform xlog reservation to lock-free
     >     algorighm to
     >>>> increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
     >>>>
     >>>> While I believe lock-free reservation make sense on huge server,
     >     it is hard
     >>>> to measure on small servers and personal computers/notebooks.
     >>>>
     >>>> But increase of NUM_XLOGINSERT_LOCKS have measurable performance
     >     gain (using
     >>>> synthetic test) even on my working notebook:
     >>>>
     >>>>    Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu
    24.04
     >>>
     >>> I've experimented with this in the past.
     >>>
     >>>
     >>> Unfortunately increasing it substantially can make the
    contention on the
     >>> spinlock *substantially* worse.
     >>>
     >>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench
     >    -n -M prepared -c$c -j$c -f <(echo "SELECT
     >    pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";)
     >   -P1 -T15
     >>>
     >>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload
     >    ran a few
     >>> times to ensure WAL is already allocated.
     >>>
     >>> With
     >>> NUM_XLOGINSERT_LOCKS = 8:       1459 tps
     >>> NUM_XLOGINSERT_LOCKS = 80:      2163 tps
     >>
     >> So, even in your test you have +50% gain from increasing
     >> NUM_XLOGINSERT_LOCKS.
     >>
     >> (And that is why I'm keen on smaller increase, like upto 64, not
    128).
     >
     > Oops, I swapped the results around when reformatting the results,
     > sorry! It's
     > the opposite way.  I.e. increasing the locks hurts.
     >
> Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS. This is a
     > slightly different disk (the other seems to have to go the way of
    the dodo),
     > so the results aren't expected to be exactly the same.
     >
     > NUM_XLOGINSERT_LOCKS  TPS
     > 1                       2583
     > 2                       2524
     > 4                       2711
     > 8                     2788
     > 16                      1938
     > 32                      1834
     > 64                      1865
     > 128                     1543
     >
     >
     >>>
     >>> The main reason is that the increase in insert locks puts a lot
     >    more pressure
     >>> on the spinlock.
     >>
     >> That it addressed by Zhiguo Zhow and me in other thread [1]. But
     >   increasing
     >> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
     >> installations), and "lock-free reservation" should be measured
     >   against it.
     >
     > I know that there's that thread, I just don't see how we can increase
     > NUM_XLOGINSERT_LOCKS due to the regressions it can cause.
     >
     >
     >>> Secondarily it's also that we spend more time iterating
     >>> through the insert locks when waiting, and that that causes a lot
     >    of cacheline
     >>> pingpong.
     >>
     >> Waiting is done with LWLockWaitForVar, and there is no wait if
     >   `insertingAt`
     >> is in future. It looks very efficient in master branch code.
     >
     > But LWLockWaitForVar is called from WaitXLogInsertionsToFinish,
    which just
     > iterates over all locks.
     >

    Hi, Yura Sokolov

    I tested the patch on Hygon C86 7490 64-core using benchmarksql 5.0 with
    500 warehouses and 256 terminals run time 10 mins:

    | case               | min          | avg          | max          |
    |--------------------+--------------+--------------+--------------|
    | master (4108440)   | 891,225.77   | 904,868.75   | 913,708.17   |
    | lock 64            | 1,007,716.95 | 1,012,013.22 | 1,018,674.00 |
    | lock 64 attempt 1  | 1,016,716.07 | 1,017,735.55 | 1,019,328.36 |
    | lock 64 attempt 2  | 1,015,328.31 | 1,018,147.74 | 1,021,513.14 |
    | lock 128           | 1,010,147.38 | 1,014,128.11 | 1,018,672.01 |
    | lock 128 attempt 1 | 1,018,154.79 | 1,023,348.35 | 1,031,365.42 |
    | lock 128 attempt 2 | 1,013,245.56 | 1,018,984.78 | 1,023,696.00 |

    I didn't NUM_XLOGINSERT_LOCKS with 16 and 32, however, I tested it
    with 256,
    and got the following error:

    2025-01-23 02:23:23.828 CST [333524] PANIC:  too many LWLocks taken

    I hope this test will be helpful.

-- Regrads,
    Japin Li




Reply via email to