On Fri, Feb 14, 2025 at 11:45 AM Pavel Borisov <pashkin.e...@gmail.com> wrote: > On Fri, 14 Feb 2025 at 00:59, Alexander Korotkov <aekorot...@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 6:39 PM Pavel Borisov <pashkin.e...@gmail.com> > > wrote: > > > On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov <aekorot...@gmail.com> > > > wrote: > > > > > > > > On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov > > > > <y.soko...@postgrespro.ru> wrote: > > > > > 13.02.2025 12:34, Alexander Korotkov пишет: > > > > > > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov > > > > > > <y.soko...@postgrespro.ru> wrote: > > > > > >> 08.02.2025 13:07, Alexander Korotkov пишет: > > > > > >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov > > > > > >>> <aekorot...@gmail.com> wrote: > > > > > >>>> Good, thank you. I think 0001 patch is generally good, but > > > > > >>>> needs some > > > > > >>>> further polishing, e.g. more comments explaining how does it > > > > > >>>> work. > > > > > >> > > > > > >> I tried to add more comments. I'm not good at, so recommendations > > > > > >> are welcome. > > > > > >> > > > > > >>> Two things comes to my mind worth rechecking about 0001. > > > > > >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and > > > > > >>> XLogCtl->xlblocks always page-aligned? Because algorithm seems > > > > > >>> to be > > > > > >>> sensitive to that. If so, I would propose to explicitly comment > > > > > >>> that > > > > > >>> and add corresponding asserts. > > > > > >> > > > > > >> They're certainly page aligned, since they are page borders. > > > > > >> I added assert on alignment of InitializeReserved for the sanity. > > > > > >> > > > > > >>> 2) Check if there are concurrency issues between > > > > > >>> AdvanceXLInsertBuffer() and switching to the new WAL file. > > > > > >> > > > > > >> There are no issues: > > > > > >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie > > > > > >> uses > > > > > >> GetXLogBuffer to zero-out WAL page. > > > > > >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion > > > > > >> locks, > > > > > >> so switching wal is not concurrent. (Although, there is no need in > > > > > >> this > > > > > >> exclusiveness, imho.) > > > > > > > > > > > > Good, thank you. I've also revised commit message and comments. > > > > > > > > > > > > But I see another issue with this patch. In the worst case, we do > > > > > > XLogWrite() by ourselves, and it could potentially could error out. > > > > > > Without patch, that would cause WALBufMappingLock be released and > > > > > > XLogCtl->InitializedUpTo not advanced. With the patch, that would > > > > > > cause other processes infinitely waiting till we finish the > > > > > > initialization. > > > > > > > > > > > > Possible solution would be to save position of the page to be > > > > > > initialized, and set it back to XLogCtl->InitializeReserved on error > > > > > > (everywhere we do LWLockReleaseAll()). We also must check that on > > > > > > error we only set XLogCtl->InitializeReserved to the past, because > > > > > > there could be multiple concurrent failures. Also we need to > > > > > > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters. > > > > > > > > > > The single place where AdvanceXLInsertBuffer is called outside of > > > > > critical > > > > > section is in XLogBackgroundFlush. All other call stacks will issue > > > > > server > > > > > restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer. > > > > > > > > > > XLogBackgroundFlush explicitely avoids writing buffers by passing > > > > > opportunistic=true parameter. > > > > > > > > > > Therefore, error in XLogWrite will not cause hang in > > > > > AdvanceXLInsertBuffer > > > > > since server will shutdown/restart. > > > > > > > > > > Perhaps, we just need to insert `Assert(CritSectionCount > 0);` > > > > > before call > > > > > to XLogWrite here? > > > > > > > > You're correct. I just reflected this in the next revision of the > > > > patch. > > > > > > I've looked at the patchset v6. > > > > Oh, sorry, I really did wrong. I've done git format-patch for wrong > > local branch for v5 and v6. Patches I've sent for v5 and v6 are > > actually the same as my v1. This is really pity. Please, find the > > right version of patchset attached. > > I've rechecked v7. In v6 a proposal from [1] was not reflected. Now it > landed in v7. > > Other changes are not regarding code behavior. The things from my > previous review that still could apply to v7: > > For 0001: > > Comment change proposed: > "lock-free with cooperation with" -> "lock-free accompanied by changes > to..." (maybe other variant)
Good catch. I've rephrased this comment even more. > I propose a new define: > #define FirstValidXLogRecPtr 1 > While FirstValidXLogRecPtr = InvalidXLogRecPtr + 1 is true in the code > that has no semantical meaning and it's better to avoid using direct > arithmetics to relate meaning of FirstValidXLogRecPtr from > InvalidXLogRecPtr. Makes sense, but I'm not sure if this change is required at all. I've reverted this to the state of master, and everything seems to work. > For 0002 both comments proposals from my message applied to v6 apply > to v7 as well Thank you for pointing. For now, I'm concentrated on improvements on 0001. Probably Yura could work on your notes to 0002. ------ Regards, Alexander Korotkov Supabase
v8-0001-Get-rid-of-WALBufMappingLock.patch
Description: Binary data
v8-0002-several-attempts-to-lock-WALInsertLocks.patch
Description: Binary data