On Mon, Jul 1, 2024 at 2:08 AM Michael Paquier <mich...@paquier.xyz> wrote: > Nope. So, Open Item, here we go.
Some initial investigation: In this test case, pg_subscriber, during the "starting the subscriber" section of its work, starts up the database in the "sub" directory as a standby. It enters standby mode, begins redo, and is then promoted, selecting timeline 2. The WAL summarizer is supposed to end summarization at the point at which timeline 1 ended and then resume summarizing from the beginning of timeline 2. But instead, it fails an assertion: Assert(switchpoint >= state->EndRecPtr); This assertion is trying to verify that, when a new timeline is spawned, we don't read past the switchpoint on the original timeline. Here, we have apparently done that. In one test, I got switchpoint = 0/51000510, state->EndRecPtr = 0/51000600. According to pg_waldump, on timeline 1, we have this record at that LSN: rmgr: Heap len (rec/tot): 54/ 54, tx: 2313637, lsn: 0/51000510, prev 0/510004D0, desc: DELETE xmax: 2313637, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/6104 blk 0 And on timeline 2, we have this at that LSN: rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/51000510, prev 0/510004D0, desc: CHECKPOINT_SHUTDOWN redo 0/51000510; tli 2; prev tli 1; fpw true; xid 0:2313638; oid 24576; multi 1; offset 0; oldest xid 730 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown It appears that pg_subscriber creates a recovery.conf that includes: recovery_target_timeline = 'latest' recovery_target_inclusive = true recovery_target_lsn = '%X/%X' ...where %X/%X represents a valid LSN. I think the problem here is that the WAL summarizer believes that when a new timeline appears, it should pick up from where the old timeline ended. And here, that doesn't happen: the new timeline branches off before the end of the old timeline, because of the recovery target. I'm not yet sure what should be done about this. The obvious answer is "remove the assertion," and maybe that is all we need to do. However, I'm not quite sure what the actual behavior will be if we just do that, so I think more investigation is needed. I'll keep looking at this, although given the US holiday I may not have results until next week. -- Robert Haas EDB: http://www.enterprisedb.com