> On 28 Nov 2025, at 21:21, Heikki Linnakangas <[email protected]> wrote:
> 
> On 27/11/2025 20:25, Andrey Borodin wrote:
>>> On 27 Nov 2025, at 01:59, Heikki Linnakangas <[email protected]> wrote:
>>> 
>>> This is because the WAL, created with old version, contains records like 
>>> this:
>>> 
>>> lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers 
>>> 2: 2830 (keysh) 2831 (keysh)
>>> lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1
>>> lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers 
>>> 2: 2831 (keysh) 2832 (keysh)
>> That's an interesting case. I don't see how SLRU interface can be
>> used to test if SLRU page is initialized correctly. We need a
>> version of SimpleLruReadPage() that can avoid failure if page does
>> not exist yet, and this must not be more expensive than current
>> SimpleLruReadPage(). Alternatively we need new
>> XLOG_MULTIXACT_CREATE_ID_2 in back branches.
> 
> There is SimpleLruDoesPhysicalPageExist() to check if a page has been 
> initialized. There's also the 'latest_page_number' field which tracks what is 
> the latest page that has been initialized.

During working on v8 of this patch I observed races between 
SimpleLruDoesPhysicalPageExist() and ExtendMultiXactOffset(). Because 
ExtendMultiXactOffset() may initialize page only in a buffer, and file system 
call will not observe it.

latest_page_number looks great! though I'm slightly worried by this commend:
/*
 * During XLOG replay, latest_page_number isn't necessarily set up
 * yet; insert a suitable value to bypass the sanity test in
 * SimpleLruTruncate.
 */

and by

/*
 * latest_page_number is the page number of the current end of the log;
 * this is __not critical data__, since we use it only to avoid swapping out
 * the latest page.
 */

If we use it in startup process to prevent failure, now it is critical data.

But I think the comments are off, latest_page_number is updated precisely.

The page initialization dance is only needed in back branches. And we 
inevitable will have conflicts with SLRU refactoring in 18 and banking in 17. 
Conceptually v12 looks good to me, I can prepare backport versions.

You definitely beat LLM in commit message clarity. Though, s/coudl/could/g.


> On 28 Nov 2025, at 21:21, Heikki Linnakangas <[email protected]> wrote:
> 
>>> - I reverted the changes to ExtendMultiXactOffset(), so that it
>>> deals with wraparound and FirstMultiXactId the same way as before.
>>> The caller never passes FirstMultiXactId, but the changed comments
>>> and the assertion were confusing, so I felt it's best to just
>>> leave it alone
>> Maybe move a decision (to extend or not to extend) out of this
>> function? Now its purpose is "MaybeExtendMultiXactOffset". And
>> there's just one caller.
> 
> Perhaps. Let's leave that for a separate non-backported patch though.


OK, we just have to be sure that we do not emit 2 ZERO_OFF_PAGE records during 
wraparound.

The problem seems to be introduced by 1986ca5 as a fix for a problem that might 
be there even longer. Perhaps, since multitransactions existed. Interestingly 
the only know to me report was by Thom Brown in 2024 (besides this thread).
https://www.postgresql.org/message-id/CAA-aLv7AcTh%2BO9rnoPVbck%3DFw8atMOYrUttk5G9ctFhXOsqQbw%40mail.gmail.com

Thanks!


Best regards, Andrey Borodin.

Reply via email to