> On 28 Nov 2025, at 21:21, Heikki Linnakangas <[email protected]> wrote:
>
> On 27/11/2025 20:25, Andrey Borodin wrote:
>>> On 27 Nov 2025, at 01:59, Heikki Linnakangas <[email protected]> wrote:
>>>
>>> This is because the WAL, created with old version, contains records like
>>> this:
>>>
>>> lsn: 0/05561030, prev 0/05561008, desc: CREATE_ID 2047 offset 4093 nmembers
>>> 2: 2830 (keysh) 2831 (keysh)
>>> lsn: 0/055611A8, prev 0/05561180, desc: ZERO_OFF_PAGE 1
>>> lsn: 0/055611D0, prev 0/055611A8, desc: CREATE_ID 2048 offset 4095 nmembers
>>> 2: 2831 (keysh) 2832 (keysh)
>> That's an interesting case. I don't see how SLRU interface can be
>> used to test if SLRU page is initialized correctly. We need a
>> version of SimpleLruReadPage() that can avoid failure if page does
>> not exist yet, and this must not be more expensive than current
>> SimpleLruReadPage(). Alternatively we need new
>> XLOG_MULTIXACT_CREATE_ID_2 in back branches.
>
> There is SimpleLruDoesPhysicalPageExist() to check if a page has been
> initialized. There's also the 'latest_page_number' field which tracks what is
> the latest page that has been initialized.
During working on v8 of this patch I observed races between
SimpleLruDoesPhysicalPageExist() and ExtendMultiXactOffset(). Because
ExtendMultiXactOffset() may initialize page only in a buffer, and file system
call will not observe it.
latest_page_number looks great! though I'm slightly worried by this commend:
/*
* During XLOG replay, latest_page_number isn't necessarily set up
* yet; insert a suitable value to bypass the sanity test in
* SimpleLruTruncate.
*/
and by
/*
* latest_page_number is the page number of the current end of the log;
* this is __not critical data__, since we use it only to avoid swapping out
* the latest page.
*/
If we use it in startup process to prevent failure, now it is critical data.
But I think the comments are off, latest_page_number is updated precisely.
The page initialization dance is only needed in back branches. And we
inevitable will have conflicts with SLRU refactoring in 18 and banking in 17.
Conceptually v12 looks good to me, I can prepare backport versions.
You definitely beat LLM in commit message clarity. Though, s/coudl/could/g.
> On 28 Nov 2025, at 21:21, Heikki Linnakangas <[email protected]> wrote:
>
>>> - I reverted the changes to ExtendMultiXactOffset(), so that it
>>> deals with wraparound and FirstMultiXactId the same way as before.
>>> The caller never passes FirstMultiXactId, but the changed comments
>>> and the assertion were confusing, so I felt it's best to just
>>> leave it alone
>> Maybe move a decision (to extend or not to extend) out of this
>> function? Now its purpose is "MaybeExtendMultiXactOffset". And
>> there's just one caller.
>
> Perhaps. Let's leave that for a separate non-backported patch though.
OK, we just have to be sure that we do not emit 2 ZERO_OFF_PAGE records during
wraparound.
The problem seems to be introduced by 1986ca5 as a fix for a problem that might
be there even longer. Perhaps, since multitransactions existed. Interestingly
the only know to me report was by Thom Brown in 2024 (besides this thread).
https://www.postgresql.org/message-id/CAA-aLv7AcTh%2BO9rnoPVbck%3DFw8atMOYrUttk5G9ctFhXOsqQbw%40mail.gmail.com
Thanks!
Best regards, Andrey Borodin.