On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov <y.soko...@postgrespro.ru> wrote:
> 13.02.2025 12:34, Alexander Korotkov пишет:
> > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov <y.soko...@postgrespro.ru> 
> > wrote:
> >> 08.02.2025 13:07, Alexander Korotkov пишет:
> >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov <aekorot...@gmail.com> 
> >>> wrote:
> >>>> Good, thank you.  I think 0001 patch is generally good, but needs some
> >>>> further polishing, e.g. more comments explaining how does it work.
> >>
> >> I tried to add more comments. I'm not good at, so recommendations are 
> >> welcome.
> >>
> >>> Two things comes to my mind worth rechecking about 0001.
> >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> >>> XLogCtl->xlblocks always page-aligned?  Because algorithm seems to be
> >>> sensitive to that.  If so, I would propose to explicitly comment that
> >>> and add corresponding asserts.
> >>
> >> They're certainly page aligned, since they are page borders.
> >> I added assert on alignment of InitializeReserved for the sanity.
> >>
> >>> 2) Check if there are concurrency issues between
> >>> AdvanceXLInsertBuffer() and switching to the new WAL file.
> >>
> >> There are no issues:
> >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> >> GetXLogBuffer to zero-out WAL page.
> >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion locks,
> >> so switching wal is not concurrent. (Although, there is no need in this
> >> exclusiveness, imho.)
> >
> > Good, thank you.  I've also revised commit message and comments.
> >
> > But I see another issue with this patch.  In the worst case, we do
> > XLogWrite() by ourselves, and it could potentially could error out.
> > Without patch, that would cause WALBufMappingLock be released and
> > XLogCtl->InitializedUpTo not advanced.  With the patch, that would
> > cause other processes infinitely waiting till we finish the
> > initialization.
> >
> > Possible solution would be to save position of the page to be
> > initialized, and set it back to XLogCtl->InitializeReserved on error
> > (everywhere we do LWLockReleaseAll()).  We also must check that on
> > error we only set XLogCtl->InitializeReserved to the past, because
> > there could be multiple concurrent failures.  Also we need to
> > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
>
> The single place where AdvanceXLInsertBuffer is called outside of critical
> section is in XLogBackgroundFlush. All other call stacks will issue server
> restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
>
> XLogBackgroundFlush explicitely avoids writing buffers by passing
> opportunistic=true parameter.
>
> Therefore, error in XLogWrite will not cause hang in AdvanceXLInsertBuffer
> since server will shutdown/restart.
>
> Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before call
> to XLogWrite here?

You're correct.  I just reflected this in the next revision of the patch.

------
Regards,
Alexander Korotkov
Supabase

Attachment: v6-0001-Get-rid-of-WALBufMappingLock.patch
Description: Binary data

Attachment: v6-0002-several-attempts-to-lock-WALInsertLocks.patch
Description: Binary data

Reply via email to