14.02.2025 17:09, Yura Sokolov пишет: > 14.02.2025 13:24, Alexander Korotkov пишет: >> On Fri, Feb 14, 2025 at 11:45 AM Pavel Borisov <pashkin.e...@gmail.com> >> wrote: >>> On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov <aekorot...@gmail.com> >>> wrote: >>>> On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov <pashkin.e...@gmail.com> >>>> wrote: >>>>> On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov <aekorot...@gmail.com> >>>>> wrote: >>>>>> >>>>>> On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov <y.soko...@postgrespro.ru> >>>>>> wrote: >>>>>>> 13.02.2025 12:34, Alexander Korotkov пишет: >>>>>>>> On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov >>>>>>>> <y.soko...@postgrespro.ru> wrote: >>>>>>>>> 08.02.2025 13:07, Alexander Korotkov пишет: >>>>>>>>>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov >>>>>>>>>> <aekorot...@gmail.com> wrote: >>>>>>>>>>> Good, thank you. I think 0001 patch is generally good, but needs >>>>>>>>>>> some >>>>>>>>>>> further polishing, e.g. more comments explaining how does it work. >>>>>>>>> >>>>>>>>> I tried to add more comments. I'm not good at, so recommendations are >>>>>>>>> welcome. >>>>>>>>> >>>>>>>>>> Two things comes to my mind worth rechecking about 0001. >>>>>>>>>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and >>>>>>>>>> XLogCtl->xlblocks always page-aligned? Because algorithm seems to be >>>>>>>>>> sensitive to that. If so, I would propose to explicitly comment that >>>>>>>>>> and add corresponding asserts. >>>>>>>>> >>>>>>>>> They're certainly page aligned, since they are page borders. >>>>>>>>> I added assert on alignment of InitializeReserved for the sanity. >>>>>>>>> >>>>>>>>>> 2) Check if there are concurrency issues between >>>>>>>>>> AdvanceXLInsertBuffer() and switching to the new WAL file. >>>>>>>>> >>>>>>>>> There are no issues: >>>>>>>>> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses >>>>>>>>> GetXLogBuffer to zero-out WAL page. >>>>>>>>> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion >>>>>>>>> locks, >>>>>>>>> so switching wal is not concurrent. (Although, there is no need in >>>>>>>>> this >>>>>>>>> exclusiveness, imho.) >>>>>>>> >>>>>>>> Good, thank you. I've also revised commit message and comments. >>>>>>>> >>>>>>>> But I see another issue with this patch. In the worst case, we do >>>>>>>> XLogWrite() by ourselves, and it could potentially could error out. >>>>>>>> Without patch, that would cause WALBufMappingLock be released and >>>>>>>> XLogCtl->InitializedUpTo not advanced. With the patch, that would >>>>>>>> cause other processes infinitely waiting till we finish the >>>>>>>> initialization. >>>>>>>> >>>>>>>> Possible solution would be to save position of the page to be >>>>>>>> initialized, and set it back to XLogCtl->InitializeReserved on error >>>>>>>> (everywhere we do LWLockReleaseAll()). We also must check that on >>>>>>>> error we only set XLogCtl->InitializeReserved to the past, because >>>>>>>> there could be multiple concurrent failures. Also we need to >>>>>>>> broadcast XLogCtl->InitializedUpToCondVar to wake up waiters. >>>>>>> >>>>>>> The single place where AdvanceXLInsertBuffer is called outside of >>>>>>> critical >>>>>>> section is in XLogBackgroundFlush. All other call stacks will issue >>>>>>> server >>>>>>> restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer. >>>>>>> >>>>>>> XLogBackgroundFlush explicitely avoids writing buffers by passing >>>>>>> opportunistic=true parameter. >>>>>>> >>>>>>> Therefore, error in XLogWrite will not cause hang in >>>>>>> AdvanceXLInsertBuffer >>>>>>> since server will shutdown/restart. >>>>>>> >>>>>>> Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before >>>>>>> call >>>>>>> to XLogWrite here? >>>>>> >>>>>> You're correct. I just reflected this in the next revision of the patch. >>>>> >>>>> I've looked at the patchset v6. >>>> >>>> Oh, sorry, I really did wrong. I've done git format-patch for wrong >>>> local branch for v5 and v6. Patches I've sent for v5 and v6 are >>>> actually the same as my v1. This is really pity. Please, find the >>>> right version of patchset attached. >>> >>> I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it >>> landed in v7. >>> >>> Other changes are not regarding code behavior. The things from my >>> previous review that still could apply to v7: >>> >>> For 0001: >>> >>> Comment change proposed: >>> "lock-free with cooperation with" -> "lock-free accompanied by changes >>> to..." (maybe other variant) >> >> Good catch. I've rephrased this comment even more. >> >>> I propose a new define: >>> #define FirstValidXLogRecPtr 1 >>> While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code >>> that has no semantical meaning and it's better to avoid using direct >>> arithmetics to relate meaning of FirstValidXLogRecPtr from >>> InvalidXLogRecPtr. >> >> Makes sense, but I'm not sure if this change is required at all. I've >> reverted this to the state of master, and everything seems to work. >> >>> For 0002 both comments proposals from my message applied to v6 apply >>> to v7 as well >> >> Thank you for pointing. For now, I'm concentrated on improvements on >> 0001. Probably Yura could work on your notes to 0002. > > I wrote good commit message for 0002 with calculated probabilities and > simple Ruby program which calculates them to explain choice of 2 > conditional attempts. (At least I hope the message is good). And added > simple comment before `int attempts = 2;` > > Also I simplified 0002 a bit to look a bit prettier (ie without goto), and > added static assert on NUM_XLOGINSERT_LOCKS being power of 2. > > (0001 patch is same as for v8) > > ------- > regards > Yura Sokolov aka funny-falcon
Oops, forgot to add StaticAssert into v9-0002.
From f9e4289000627801506226b0cd140872639e95ab Mon Sep 17 00:00:00 2001 From: Alexander Korotkov <akorot...@postgresql.org> Date: Thu, 13 Feb 2025 01:20:20 +0200 Subject: [PATCH v10 1/2] Get rid of WALBufMappingLock Allow multiple backends to initialize WAL buffers concurrently. This way `MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without taking a single LWLock in exclusive mode. The new algorithm works as follows: * reserve a page for initialization, * Ensure the page is written out, * once the page is initialized, signal concurrent initializers using ConditionVariable, * repeat previous steps until we reserve initialization up to the target WAL position, * wait until concurrent initialization finishes using a ConditionVariable. Now, multiple backends can, in parallel, concurrently reserve pages, initialize them, and advance XLogCtl->InitializedUpTo to point to the latest initialized page. Author: <y.soko...@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorot...@gmail.com> Reviewed-by: Pavel Borisov <pashkin.e...@gmail.com> --- src/backend/access/transam/xlog.c | 176 +++++++++++++----- .../utils/activity/wait_event_names.txt | 2 +- src/include/storage/lwlocklist.h | 2 +- 3 files changed, 132 insertions(+), 48 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a50fd99d9e5..c232ba2d599 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -302,11 +302,6 @@ static bool doPageWrites; * so it's a plain spinlock. The other locks are held longer (potentially * over I/O operations), so we use LWLocks for them. These locks are: * - * WALBufMappingLock: must be held to replace a page in the WAL buffer cache. - * It is only held while initializing and changing the mapping. If the - * contents of the buffer being replaced haven't been written yet, the mapping - * lock is released while the write is done, and reacquired afterwards. - * * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or * XLogFlush). * @@ -473,21 +468,32 @@ typedef struct XLogCtlData pg_atomic_uint64 logFlushResult; /* last byte + 1 flushed */ /* - * Latest initialized page in the cache (last byte position + 1). + * Latest reserved for inititalization page in the cache (last byte + * position + 1). * - * To change the identity of a buffer (and InitializedUpTo), you need to - * hold WALBufMappingLock. To change the identity of a buffer that's + * To change the identity of a buffer, you need to advance + * InitializeReserved first. To change the identity of a buffer that's * still dirty, the old page needs to be written out first, and for that * you need WALWriteLock, and you need to ensure that there are no * in-progress insertions to the page by calling * WaitXLogInsertionsToFinish(). */ - XLogRecPtr InitializedUpTo; + pg_atomic_uint64 InitializeReserved; + + /* + * Latest initialized page in the cache (last byte position + 1). + * + * InitializedUpTo is updated after the buffer initialization. After + * update, waiters got notification using InitializedUpToCondVar. + */ + pg_atomic_uint64 InitializedUpTo; + ConditionVariable InitializedUpToCondVar; /* * These values do not change after startup, although the pointed-to pages - * and xlblocks values certainly do. xlblocks values are protected by - * WALBufMappingLock. + * and xlblocks values certainly do. xlblocks values are changed + * lock-free according to the check for the xlog write position and are + * accompanied by changes of InitializeReserved and InitializedUpTo. */ char *pages; /* buffers for unwritten XLOG pages */ pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */ @@ -810,9 +816,9 @@ XLogInsertRecord(XLogRecData *rdata, * fullPageWrites from changing until the insertion is finished. * * Step 2 can usually be done completely in parallel. If the required WAL - * page is not initialized yet, you have to grab WALBufMappingLock to - * initialize it, but the WAL writer tries to do that ahead of insertions - * to avoid that from happening in the critical path. + * page is not initialized yet, you have to go through AdvanceXLInsertBuffer, + * which will ensure it is initialized. But the WAL writer tries to do that + * ahead of insertions to avoid that from happening in the critical path. * *---------- */ @@ -1991,32 +1997,70 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) XLogRecPtr NewPageEndPtr = InvalidXLogRecPtr; XLogRecPtr NewPageBeginPtr; XLogPageHeader NewPage; + XLogRecPtr ReservedPtr; int npages pg_attribute_unused() = 0; - LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE); - /* - * Now that we have the lock, check if someone initialized the page - * already. + * We must run the loop below inside the critical section as we expect + * XLogCtl->InitializedUpTo to eventually keep up. The most of callers + * already run inside the critical section. Except for WAL writer, which + * passed 'opportunistic == true', and therefore we don't perform + * operations that could error out. + * + * Start an explicit critical section anyway though. */ - while (upto >= XLogCtl->InitializedUpTo || opportunistic) + Assert(CritSectionCount > 0 || opportunistic); + START_CRIT_SECTION(); + + /*-- + * Loop till we get all the pages in WAL buffer before 'upto' reserved for + * initialization. Multiple process can initialize different buffers with + * this loop in parallel as following. + * + * 1. Reserve page for initialization using XLogCtl->InitializeReserved. + * 2. Initialize the reserved page. + * 3. Attempt to advance XLogCtl->InitializedUpTo, + */ + ReservedPtr = pg_atomic_read_u64(&XLogCtl->InitializeReserved); + while (upto >= ReservedPtr || opportunistic) { - nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo); + Assert(ReservedPtr % XLOG_BLCKSZ == 0); /* - * Get ending-offset of the buffer page we need to replace (this may - * be zero if the buffer hasn't been used yet). Fall through if it's - * already written out. + * Get ending-offset of the buffer page we need to replace. + * + * We don't lookup into xlblocks, but rather calculate position we + * must wait to be written. If it was written, xlblocks will have this + * position (or uninitialized) */ - OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]); - if (LogwrtResult.Write < OldPageRqstPtr) + if (ReservedPtr + XLOG_BLCKSZ > XLOG_BLCKSZ * XLOGbuffers) + OldPageRqstPtr = ReservedPtr + XLOG_BLCKSZ - XLOG_BLCKSZ * XLOGbuffers; + else + OldPageRqstPtr = InvalidXLogRecPtr; + + if (LogwrtResult.Write < OldPageRqstPtr && opportunistic) { /* - * Nope, got work to do. If we just want to pre-initialize as much - * as we can without flushing, give up now. + * If we just want to pre-initialize as much as we can without + * flushing, give up now. */ - if (opportunistic) - break; + upto = ReservedPtr - 1; + break; + } + + /* + * Attempt to reserve the page for initialization. Failure means that + * this page got reserved by another process. + */ + if (!pg_atomic_compare_exchange_u64(&XLogCtl->InitializeReserved, + &ReservedPtr, + ReservedPtr + XLOG_BLCKSZ)) + continue; + + /* Fall through if it's already written out. */ + if (LogwrtResult.Write < OldPageRqstPtr) + { + /* Nope, got work to do. */ /* Advance shared memory write request position */ SpinLockAcquire(&XLogCtl->info_lck); @@ -2031,14 +2075,6 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) RefreshXLogWriteResult(LogwrtResult); if (LogwrtResult.Write < OldPageRqstPtr) { - /* - * Must acquire write lock. Release WALBufMappingLock first, - * to make sure that all insertions that we need to wait for - * can finish (up to this same position). Otherwise we risk - * deadlock. - */ - LWLockRelease(WALBufMappingLock); - WaitXLogInsertionsToFinish(OldPageRqstPtr); LWLockAcquire(WALWriteLock, LW_EXCLUSIVE); @@ -2060,9 +2096,6 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) PendingWalStats.wal_buffers_full++; TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE(); } - /* Re-acquire WALBufMappingLock and retry */ - LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE); - continue; } } @@ -2070,10 +2103,17 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) * Now the next buffer slot is free and we can set it up to be the * next output page. */ - NewPageBeginPtr = XLogCtl->InitializedUpTo; + NewPageBeginPtr = ReservedPtr; NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; + nextidx = XLogRecPtrToBufIdx(ReservedPtr); - Assert(XLogRecPtrToBufIdx(NewPageBeginPtr) == nextidx); +#ifdef USE_ASSERT_CHECKING + { + XLogRecPtr storedBound = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]); + + Assert(storedBound == OldPageRqstPtr || storedBound == InvalidXLogRecPtr); + } +#endif NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ); @@ -2139,11 +2179,50 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic) pg_write_barrier(); pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); - XLogCtl->InitializedUpTo = NewPageEndPtr; + + /* + * Try to advance XLogCtl->InitializedUpTo. + * + * If the CAS operation failed, then some of previous pages are not + * initialized yet, and this backend gives up. + * + * Since initializer of next page might give up on advancing of + * InitializedUpTo, this backend have to attempt advancing until it + * find page "in the past" or concurrent backend succeeded at + * advancing. When we finish advancing XLogCtl->InitializedUpTo, we + * notify all the waiters with XLogCtl->InitializedUpToCondVar. + */ + while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, &NewPageBeginPtr, NewPageEndPtr)) + { + NewPageBeginPtr = NewPageEndPtr; + NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; + nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr); + + if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != NewPageEndPtr) + { + /* + * Page at nextidx wasn't initialized yet, so we cann't move + * InitializedUpto further. It will be moved by backend which + * will initialize nextidx. + */ + ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); + break; + } + } npages++; } - LWLockRelease(WALBufMappingLock); + + END_CRIT_SECTION(); + + /* + * All the pages in WAL buffer before 'upto' were reserved for + * initialization. However, some pages might be reserved by concurrent + * processes. Wait till they finish initialization. + */ + while (upto >= pg_atomic_read_u64(&XLogCtl->InitializedUpTo)) + ConditionVariableSleep(&XLogCtl->InitializedUpToCondVar, WAIT_EVENT_WAL_BUFFER_INIT); + ConditionVariableCancelSleep(); #ifdef WAL_DEBUG if (XLOG_DEBUG && npages > 0) @@ -5044,6 +5123,10 @@ XLOGShmemInit(void) pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr); pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr); pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr); + + pg_atomic_init_u64(&XLogCtl->InitializeReserved, InvalidXLogRecPtr); + pg_atomic_init_u64(&XLogCtl->InitializedUpTo, InvalidXLogRecPtr); + ConditionVariableInit(&XLogCtl->InitializedUpToCondVar); } /* @@ -6063,7 +6146,7 @@ StartupXLOG(void) memset(page + len, 0, XLOG_BLCKSZ - len); pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ); - XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ; + pg_atomic_write_u64(&XLogCtl->InitializedUpTo, endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ); } else { @@ -6072,8 +6155,9 @@ StartupXLOG(void) * let the first attempt to insert a log record to initialize the next * buffer. */ - XLogCtl->InitializedUpTo = EndOfLog; + pg_atomic_write_u64(&XLogCtl->InitializedUpTo, EndOfLog); } + pg_atomic_write_u64(&XLogCtl->InitializeReserved, pg_atomic_read_u64(&XLogCtl->InitializedUpTo)); /* * Update local and shared status. This is OK to do without any locks diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt index e199f071628..ccf73781d81 100644 --- a/src/backend/utils/activity/wait_event_names.txt +++ b/src/backend/utils/activity/wait_event_names.txt @@ -155,6 +155,7 @@ REPLICATION_SLOT_DROP "Waiting for a replication slot to become inactive so it c RESTORE_COMMAND "Waiting for <xref linkend="guc-restore-command"/> to complete." SAFE_SNAPSHOT "Waiting to obtain a valid snapshot for a <literal>READ ONLY DEFERRABLE</literal> transaction." SYNC_REP "Waiting for confirmation from a remote server during synchronous replication." +WAL_BUFFER_INIT "Waiting on WAL buffer to be initialized." WAL_RECEIVER_EXIT "Waiting for the WAL receiver to exit." WAL_RECEIVER_WAIT_START "Waiting for startup process to send initial data for streaming replication." WAL_SUMMARY_READY "Waiting for a new WAL summary to be generated." @@ -310,7 +311,6 @@ XidGen "Waiting to allocate a new transaction ID." ProcArray "Waiting to access the shared per-process data structures (typically, to get a snapshot or report a session's transaction ID)." SInvalRead "Waiting to retrieve messages from the shared catalog invalidation queue." SInvalWrite "Waiting to add a message to the shared catalog invalidation queue." -WALBufMapping "Waiting to replace a page in WAL buffers." WALWrite "Waiting for WAL buffers to be written to disk." ControlFile "Waiting to read or update the <filename>pg_control</filename> file or create a new WAL file." MultiXactGen "Waiting to read or update shared multixact state." diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h index cf565452382..ff897515769 100644 --- a/src/include/storage/lwlocklist.h +++ b/src/include/storage/lwlocklist.h @@ -37,7 +37,7 @@ PG_LWLOCK(3, XidGen) PG_LWLOCK(4, ProcArray) PG_LWLOCK(5, SInvalRead) PG_LWLOCK(6, SInvalWrite) -PG_LWLOCK(7, WALBufMapping) +/* 7 was WALBufMapping */ PG_LWLOCK(8, WALWrite) PG_LWLOCK(9, ControlFile) /* 10 was CheckpointLock */ -- 2.43.0
From 6b22d2e63d061be465e9e9f5ce2fefdaae6ebf0e Mon Sep 17 00:00:00 2001 From: Yura Sokolov <y.soko...@postgrespro.ru> Date: Thu, 16 Jan 2025 15:30:57 +0300 Subject: [PATCH v10 2/2] Several attempts to lock WALInsertLocks. Birthday paradox tells we could get collision with 34% probability with just 3, 58% with 4 and 79% with 5 concurrent processes (when NUM_XLOGINSERT_LOCKS == 8). Trying several times to lock conditionally with just linear increase by random delta, will reduce probability of blocking greatly: - with 1 conditional attempt - 4%, 14% and 33% respectively - with 2 conditional attempts - 0%, 2.1% and 9.7% - with 3 conditional attempts - 0%, 0% and 1.85% Probabilities are calculated with simple Ruby program: def try_add(arr, n) a, b = rand(8), rand(8)|1 n.times { (arr << a; break) unless arr.include?(a) a = (a + b) % 8 } end def calc_prob(n, k) 300000.times. map{ ar=[]; n.times{ try_add(ar, k) }; ar}. count{|a| a.length < n} / 300000.0 * 100 end (3..5).each{|n| (1..5).each{|k| p [n,k,calc_prob(n, k).round(2)]}} Given every attempt is a cache miss, 2 attempts following non-conditional lock looks to be good compromise. --- src/backend/access/transam/xlog.c | 44 ++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c232ba2d599..13fb1c35ad8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -68,6 +68,7 @@ #include "catalog/pg_database.h" #include "common/controldata_utils.h" #include "common/file_utils.h" +#include "common/pg_prng.h" #include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" @@ -1376,7 +1377,11 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata, static void WALInsertLockAcquire(void) { - bool immed; + /* + * Two conditional attempts looks to be good compromise between good + * probability to acquire lock and cache misses on every attempt. + */ + int attempts = 2; /* * It doesn't matter which of the WAL insertion locks we acquire, so try @@ -1389,29 +1394,36 @@ WALInsertLockAcquire(void) * (semi-)randomly. This allows the locks to be used evenly if you have a * lot of very short connections. */ - static int lockToTry = -1; + static uint32 lockToTry = 0; + static uint32 lockDelta = 0; - if (lockToTry == -1) + if (lockDelta == 0) + { lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS; + lockDelta = pg_prng_uint32(&pg_global_prng_state) % NUM_XLOGINSERT_LOCKS; + lockDelta |= 1; /* must be odd */ + StaticAssertStmt((NUM_XLOGINSERT_LOCKS & (NUM_XLOGINSERT_LOCKS - 1)) == 0, + "NUM_XLOGINSERT_LOCKS must be power of 2"); + } + MyLockNo = lockToTry; - /* - * The insertingAt value is initially set to 0, as we don't know our - * insert location yet. - */ - immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE); - if (!immed) + while (attempts-- > 0) { + if (LWLockConditionalAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE)) + return; + /* - * If we couldn't get the lock immediately, try another lock next - * time. On a system with more insertion locks than concurrent - * inserters, this causes all the inserters to eventually migrate to a - * lock that no-one else is using. On a system with more inserters - * than locks, it still helps to distribute the inserters evenly - * across the locks. + * If we couldn't get the lock immediately, try another lock. On a + * system with more insertion locks than concurrent inserters, this + * causes all the inserters to eventually migrate to a lock that + * no-one else is using. On a system with more inserters than locks, + * it still helps to distribute the inserters evenly across the locks. */ - lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS; + lockToTry = (lockToTry + lockDelta) % NUM_XLOGINSERT_LOCKS; + MyLockNo = lockToTry; } + LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE); } /* -- 2.43.0