On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov <y.soko...@postgrespro.ru> wrote: > 13.02.2025 12:34, Alexander Korotkov пишет: > > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov <y.soko...@postgrespro.ru> > > wrote: > >> 08.02.2025 13:07, Alexander Korotkov пишет: > >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov <aekorot...@gmail.com> > >>> wrote: > >>>> Good, thank you. I think 0001 patch is generally good, but needs some > >>>> further polishing, e.g. more comments explaining how does it work. > >> > >> I tried to add more comments. I'm not good at, so recommendations are > >> welcome. > >> > >>> Two things comes to my mind worth rechecking about 0001. > >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and > >>> XLogCtl->xlblocks always page-aligned? Because algorithm seems to be > >>> sensitive to that. If so, I would propose to explicitly comment that > >>> and add corresponding asserts. > >> > >> They're certainly page aligned, since they are page borders. > >> I added assert on alignment of InitializeReserved for the sanity. > >> > >>> 2) Check if there are concurrency issues between > >>> AdvanceXLInsertBuffer() and switching to the new WAL file. > >> > >> There are no issues: > >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses > >> GetXLogBuffer to zero-out WAL page. > >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion locks, > >> so switching wal is not concurrent. (Although, there is no need in this > >> exclusiveness, imho.) > > > > Good, thank you. I've also revised commit message and comments. > > > > But I see another issue with this patch. In the worst case, we do > > XLogWrite() by ourselves, and it could potentially could error out. > > Without patch, that would cause WALBufMappingLock be released and > > XLogCtl->InitializedUpTo not advanced. With the patch, that would > > cause other processes infinitely waiting till we finish the > > initialization. > > > > Possible solution would be to save position of the page to be > > initialized, and set it back to XLogCtl->InitializeReserved on error > > (everywhere we do LWLockReleaseAll()). We also must check that on > > error we only set XLogCtl->InitializeReserved to the past, because > > there could be multiple concurrent failures. Also we need to > > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters. > > The single place where AdvanceXLInsertBuffer is called outside of critical > section is in XLogBackgroundFlush. All other call stacks will issue server > restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer. > > XLogBackgroundFlush explicitely avoids writing buffers by passing > opportunistic=true parameter. > > Therefore, error in XLogWrite will not cause hang in AdvanceXLInsertBuffer > since server will shutdown/restart. > > Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before call > to XLogWrite here?
You're correct. I just reflected this in the next revision of the patch. ------ Regards, Alexander Korotkov Supabase
v6-0001-Get-rid-of-WALBufMappingLock.patch
Description: Binary data
v6-0002-several-attempts-to-lock-WALInsertLocks.patch
Description: Binary data