On Wed, Jul 10, 2024 at 5:01 PM Robert Haas <robertmh...@gmail.com> wrote: > Here is a draft patch that attempts to fix this problem. I'm not > certain that it's completely correct, but it does seem to fix the > reported issue.
I tried to write a test case for this and discovered that there are actually two separate problems in this area. First, as shown by the assertion failure reported by Fujii Masao, the WAL summarizer thinks that it should never need to back up to an earlier LSN, and the test case he provided shows that this is incorrect. Second, the WAL summarizer can end up in a bad state after the startup process renames the last WAL file on the old timeline to a .partial file. If this happens before the file has been summarized, then the WAL summarizer can't access it any more and errors out. Promotion also removes WAL files from the old timeline completely, but only if they're after the switch point, and summarization doesn't care about those anyway. So the partial file seems to be the only problem case. In theory, the problem with the partial file could be handled in a variety of ways: we could teach summarization to read the partial file, perhaps, or postpone adding the .partial suffix until after summarization has happened. But in practice, given where we are in the release cycle, the only reasonable approach that I can see is to have promotion wait for summarization to catch up, so that's what I did in 0003. 0002 is the same as what I posted previously as 0001, and teaches the summarizer about backing up when we switch timelines. 0001 adds a missing call to ConditionVariableCancelSleep; AFAIK, that omission has no important consequences, but still seems like it should be fixed. -- Robert Haas EDB: http://www.enterprisedb.com
v2-0002-Allow-WAL-summarization-to-back-up-when-timeline-.patch
Description: Binary data
v2-0003-Wait-for-WAL-summarization-to-catch-up-before-cre.patch
Description: Binary data
v2-0001-Add-missing-call-to-ConditionVariableCancelSleep.patch
Description: Binary data