Hi, On 2023-07-26 07:40:31 +0900, Michael Paquier wrote: > On Tue, Jul 25, 2023 at 09:49:01AM -0700, Andres Freund wrote: > > FWIW, I'm working on a patch that replaces WAL insert locks as a whole, > > because they don't scale all that well. > > What were you looking at here? Just wondering.
Here's what I had written offlist a few days ago: > The basic idea is to have a ringbuffer of in-progress insertions. The > acquisition of a position in the ringbuffer is done at the same time as > advancing the reserved LSN, using a 64bit xadd. The trick that makes that > possible is to use the high bits of the atomic for the position in the > ringbuffer. The point of using the high bits is that they wrap around, without > affecting the rest of the value. > > Of course, when using xadd, we can't keep the "prev" pointer in the > atomic. That's where the ringbuffer comes into play. Whenever one inserter has > determined the byte pos of its insertion, it updates the "prev byte pos" in > the *next* ringbuffer entry. > > Of course that means that insertion N+1 needs to wait for N to set the prev > position - but that happens very quickly. In my very hacky prototype the > relevant path (which for now just spins) is reached very rarely, even when > massively oversubscribed. While I've not implemented that, N+1 could actually > do the first "iteration" in CopyXLogRecordToWAL() before it needs the prev > position, the COMP_CRC32C() could happen "inside" the buffer. > > > There's a fair bit of trickyness in turning that into something working, of > course :). Ensuring that the ring buffer of insertions doesn't wrap around is > non-trivial. Nor is trivial to ensure that the "reduced" space LSN in the > atomic can't overflow. > > I do wish MAX_BACKENDS were smaller... > > > Until last night I thought all my schemes would continue to need something > like the existing WAL insertion locks, to implement > WALInsertLockAcquireExclusive(). > > But I think I came up with an idea to do away with that (not even prototyped > yet): Use one bit in the atomic that indicates that no new insertions are > allowed. Whenever the xadd finds that old value actually was locked, it > "aborts" the insertion, and instead waits for a condition variable (or > something similar). Of course that's after modifying the atomic - to deal with > that the "lock holder" reverts all modifications that have been made to the > atomic when releasing the "lock", they weren't actually successful and all > those backends will retry. > > Except that this doesn't quite suffice - XLogInsertRecord() needs to be able > to "roll back", when it finds that we now need to log FPIs. I can't quite see > how to make that work with what I describe above. The only idea I have so far > is to just waste the space with a NOOP record - it should be pretty rare. At > least if we updated RedoRecPtr eagerly (or just stopped this stupid business > of having an outdated backend-local copy). > > > > My prototype shows this idea to be promising. It's a tad slower at low > concurrency, but much better at high concurrency. I think most if not all of > the low-end overhead isn't inherent, but comes from having both old and new > infrastructure in place (as well as a lot of debugging cruft). Greetings, Andres Freund