On Tue, Jan 24, 2023 at 7:00 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > I'm attaching the v3 patch with the above review comments addressed. > Hopefully, no memory ordering issues now. FWIW, I've added it to CF > https://commitfest.postgresql.org/42/4141/. > > Test results with the v3 patch and insert workload are the same as > that of the earlier run - TPS starts to scale at higher clients as > expected after 512 clients and peaks at 2X with 2048 and 4096 clients. > > HEAD: > 1 1380.411086 > 2 1358.378988 > 4 2701.974332 > 8 5925.380744 > 16 10956.501237 > 32 20877.513953 > 64 40838.046774 > 128 70251.744161 > 256 108114.321299 > 512 120478.988268 > 768 99140.425209 > 1024 93645.984364 > 2048 70111.159909 > 4096 55541.804826 > > v3 PATCHED: > 1 1493.800209 > 2 1569.414953 > 4 3154.186605 > 8 5965.578904 > 16 11912.587645 > 32 22720.964908 > 64 42001.094528 > 128 78361.158983 > 256 110457.926232 > 512 148941.378393 > 768 167256.590308 > 1024 155510.675372 > 2048 147499.376882 > 4096 119375.457779
I slightly modified the comments and attached the v4 patch for further review. I also took perf report - there's a clear reduction in the functions that are affected by the patch - LWLockWaitListLock, WaitXLogInsertionsToFinish, LWLockWaitForVar and LWLockConflictsWithVar. Note that I compiled the source code with -ggdb for capturing symbols for perf, still the benefit stands at > 2X for a higher number of clients. HEAD: + 16.87% 0.01% postgres [.] CommitTransactionCommand + 16.86% 0.00% postgres [.] finish_xact_command + 16.81% 0.01% postgres [.] CommitTransaction + 15.09% 0.20% postgres [.] LWLockWaitListLock + 14.53% 0.01% postgres [.] WaitXLogInsertionsToFinish + 14.51% 0.02% postgres [.] LWLockWaitForVar + 11.70% 11.63% postgres [.] pg_atomic_read_u32_impl + 11.66% 0.08% postgres [.] pg_atomic_read_u32 + 9.96% 0.03% postgres [.] LWLockConflictsWithVar + 4.78% 0.00% postgres [.] LWLockQueueSelf + 1.91% 0.01% postgres [.] pg_atomic_fetch_or_u32 + 1.91% 1.89% postgres [.] pg_atomic_fetch_or_u32_impl + 1.73% 0.00% postgres [.] XLogInsert + 1.69% 0.01% postgres [.] XLogInsertRecord + 1.41% 0.01% postgres [.] LWLockRelease + 1.37% 0.47% postgres [.] perform_spin_delay + 1.11% 1.11% postgres [.] spin_delay + 1.10% 0.03% postgres [.] exec_bind_message + 0.91% 0.00% postgres [.] WALInsertLockRelease + 0.91% 0.00% postgres [.] LWLockReleaseClearVar + 0.72% 0.02% postgres [.] LWLockAcquire + 0.60% 0.00% postgres [.] LWLockDequeueSelf + 0.58% 0.00% postgres [.] GetTransactionSnapshot 0.58% 0.49% postgres [.] GetSnapshotData + 0.58% 0.00% postgres [.] WALInsertLockAcquire + 0.55% 0.00% postgres [.] XactLogCommitRecord TPS (compiled with -ggdb for capturing symbols for perf) 1 1392.512967 2 1435.899119 4 3104.091923 8 6159.305522 16 11477.641780 32 22701.000718 64 41662.425880 128 23743.426209 256 89837.651619 512 65164.221500 768 66015.733370 1024 56421.223080 2048 52909.018072 4096 40071.146985 PATCHED: + 2.19% 0.05% postgres [.] LWLockWaitListLock + 2.10% 0.01% postgres [.] LWLockQueueSelf + 1.73% 1.71% postgres [.] pg_atomic_read_u32_impl + 1.73% 0.02% postgres [.] pg_atomic_read_u32 + 1.72% 0.02% postgres [.] LWLockRelease + 1.65% 0.04% postgres [.] exec_bind_message + 1.43% 0.00% postgres [.] XLogInsert + 1.42% 0.01% postgres [.] WaitXLogInsertionsToFinish + 1.40% 0.03% postgres [.] LWLockWaitForVar + 1.38% 0.02% postgres [.] XLogInsertRecord + 0.93% 0.03% postgres [.] LWLockAcquireOrWait + 0.91% 0.00% postgres [.] GetTransactionSnapshot + 0.91% 0.79% postgres [.] GetSnapshotData + 0.91% 0.00% postgres [.] WALInsertLockRelease + 0.91% 0.00% postgres [.] LWLockReleaseClearVar + 0.53% 0.02% postgres [.] ExecInitModifyTable TPS (compiled with -ggdb for capturing symbols for perf) 1 1295.296611 2 1459.079162 4 2865.688987 8 5533.724983 16 10771.697842 32 20557.499312 64 39436.423783 128 42555.639048 256 73139.060227 512 124649.665196 768 131162.826976 1024 132185.160007 2048 117377.586644 4096 88240.336940 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 74c5bd8cc4f1497aa7f2fa02c6487039dc91e847 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Thu, 2 Feb 2023 03:42:27 +0000 Subject: [PATCH v4] Optimize WAL insertion lock acquisition and release This commit optimizes WAL insertion lock acquisition and release in the following way: 1. WAL insertion lock's variable insertingAt is currently read and written with the help of lwlock's wait list lock to avoid torn-free reads/writes. This wait list lock can become a point of contention on a highly concurrent write workloads. Therefore, make insertingAt a 64-bit atomic which inherently provides torn-free reads/writes. 2. LWLockUpdateVar currently acquires lwlock's wait list lock even when there are no waiters at all. Add a fastpath exit to LWLockUpdateVar when there are no waiters to avoid unnecessary locking. Note that atomic exchange operation (which is a full barrier) is used when necessary, instead of atomic write to ensure the memory ordering is preserved. It also adds a note in WaitXLogInsertionsToFinish regarding how the use of spinlock there can avoid explicit memory barrier in some subsequently called functions. Suggested-by: Andres Freund Author: Bharath Rupireddy Reviewed-by: Nathan Bossart Discussion: https://www.postgresql.org/message-id/20221124184619.xit4sfi52bcz2tva%40awork3.anarazel.de --- src/backend/access/transam/xlog.c | 14 +++++-- src/backend/storage/lmgr/lwlock.c | 66 +++++++++++++++++++------------ src/include/storage/lwlock.h | 6 +-- 3 files changed, 55 insertions(+), 31 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fb4c860bde..95aed0e97f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -350,7 +350,8 @@ typedef struct XLogwrtResult * wait for all currently in-progress insertions to finish, but the * insertingAt indicator allows you to ignore insertions to later in the WAL, * so that you only wait for the insertions that are modifying the buffers - * you're about to write out. + * you're about to write out. Using an atomic variable for insertingAt avoids + * taking any explicit lock for reads and writes. * * This isn't just an optimization. If all the WAL buffers are dirty, an * inserter that's holding a WAL insert lock might need to evict an old WAL @@ -376,7 +377,7 @@ typedef struct XLogwrtResult typedef struct { LWLock lock; - XLogRecPtr insertingAt; + pg_atomic_uint64 insertingAt; XLogRecPtr lastImportantAt; } WALInsertLock; @@ -1496,6 +1497,13 @@ WaitXLogInsertionsToFinish(XLogRecPtr upto) * calling LWLockUpdateVar. But if it has to sleep, it will * advertise the insertion point with LWLockUpdateVar before * sleeping. + * + * XXX: Use of a spinlock at the beginning of this function to read + * current insert position implies memory ordering. That means that + * the immediate loads and stores to shared memory (for instance, + * in LWLockUpdateVar called via LWLockWaitForVar) don't need an + * explicit memory barrier as far as the current usage is + * concerned. But that might not be safe in general. */ if (LWLockWaitForVar(&WALInsertLocks[i].l.lock, &WALInsertLocks[i].l.insertingAt, @@ -4596,7 +4604,7 @@ XLOGShmemInit(void) for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++) { LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT); - WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr; + pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr); WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr; } diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index d2ec396045..27c3b63c68 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -1547,9 +1547,8 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) * *result is set to true if the lock was free, and false otherwise. */ static bool -LWLockConflictsWithVar(LWLock *lock, - uint64 *valptr, uint64 oldval, uint64 *newval, - bool *result) +LWLockConflictsWithVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval, + uint64 *newval, bool *result) { bool mustwait; uint64 value; @@ -1572,13 +1571,11 @@ LWLockConflictsWithVar(LWLock *lock, *result = false; /* - * Read value using the lwlock's wait list lock, as we can't generally - * rely on atomic 64 bit reads/stores. TODO: On platforms with a way to - * do atomic 64 bit reads/writes the spinlock should be optimized away. + * Read value atomically without any explicit lock. We rely on 64-bit + * atomic reads/writes that transparently does the required work to make + * even non-atomic reads/writes tear free. */ - LWLockWaitListLock(lock); - value = *valptr; - LWLockWaitListUnlock(lock); + value = pg_atomic_read_u64(valptr); if (value != oldval) { @@ -1607,7 +1604,8 @@ LWLockConflictsWithVar(LWLock *lock, * in shared mode, returns 'true'. */ bool -LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) +LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval, + uint64 *newval) { PGPROC *proc = MyProc; int extraWaits = 0; @@ -1735,29 +1733,47 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) * LWLockUpdateVar - Update a variable and wake up waiters atomically * * Sets *valptr to 'val', and wakes up all processes waiting for us with - * LWLockWaitForVar(). Setting the value and waking up the processes happen - * atomically so that any process calling LWLockWaitForVar() on the same lock - * is guaranteed to see the new value, and act accordingly. + * LWLockWaitForVar(). It first sets the value atomically and then wakes up + * the waiting processes so that any process calling LWLockWaitForVar() on the + * same lock is guaranteed to see the new value, and act accordingly. * * The caller must be holding the lock in exclusive mode. */ void -LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val) +LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val) { proclist_head wakeup; proclist_mutable_iter iter; PRINT_LWDEBUG("LWLockUpdateVar", lock, LW_EXCLUSIVE); + /* + * Update the lock variable atomically first without having to acquire wait + * list lock, so that if anyone looking for the lock will have chance to + * grab it a bit quickly. + * + * NB: Note the use of pg_atomic_exchange_u64 as opposed to just + * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is + * a full barrier, we're guaranteed that the subsequent atomic read of lock + * state to check if it has any waiters happens after we set the lock + * variable to new value here. Without a barrier, we could end up missing + * waiters that otherwise should have been woken up. + */ + pg_atomic_exchange_u64(valptr, val); + + /* + * Quick exit when there are no waiters. This avoids unnecessary lwlock's + * wait list lock acquisition and release. + */ + if ((pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS) == 0) + return; + proclist_init(&wakeup); LWLockWaitListLock(lock); Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE); - /* Update the lock's value */ - *valptr = val; - /* * See if there are any LW_WAIT_UNTIL_FREE waiters that need to be woken * up. They are always in the front of the queue. @@ -1873,17 +1889,17 @@ LWLockRelease(LWLock *lock) * LWLockReleaseClearVar - release a previously acquired lock, reset variable */ void -LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val) +LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val) { - LWLockWaitListLock(lock); - /* - * Set the variable's value before releasing the lock, that prevents race - * a race condition wherein a new locker acquires the lock, but hasn't yet - * set the variables value. + * Update the lock variable atomically first. + * + * NB: Note the use of pg_atomic_exchange_u64 as opposed to just + * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is + * a full barrier, we're guaranteed that the subsequent shared memory + * reads/writes, if any, happen after we reset the lock variable. */ - *valptr = val; - LWLockWaitListUnlock(lock); + pg_atomic_exchange_u64(valptr, val); LWLockRelease(lock); } diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h index d2c7afb8f4..f19bc49193 100644 --- a/src/include/storage/lwlock.h +++ b/src/include/storage/lwlock.h @@ -128,14 +128,14 @@ extern bool LWLockAcquire(LWLock *lock, LWLockMode mode); extern bool LWLockConditionalAcquire(LWLock *lock, LWLockMode mode); extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode); extern void LWLockRelease(LWLock *lock); -extern void LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val); +extern void LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val); extern void LWLockReleaseAll(void); extern bool LWLockHeldByMe(LWLock *lock); extern bool LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride); extern bool LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode); -extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval); -extern void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val); +extern bool LWLockWaitForVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 oldval, uint64 *newval); +extern void LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val); extern Size LWLockShmemSize(void); extern void CreateLWLocks(void); -- 2.34.1