On Fri, 21 Oct 2022 at 11:44, Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi > <horikyota....@gmail.com> wrote in > > latest works. It dones't consider the case of explict target timlines > > so it's just a PoC. (So this doesn't work if recovery_target_timeline > > is set to 2 for the "standby" in the repro.) > > So, finally I noticed that the function XLogFileReadAnyTLI is not > needed at all if we are going this direction. > > Regardless of recvoery_target_timeline is latest or any explicit > imeline id or checkpoint timeline, what we can do to reach the target > timline is just to follow the history file's direction. > > If segments are partly gone while reading on a timeline, a segment on > the older timelines is just a crap since it should be incompatible.
I came to the same conclusion. I adjusted XLogFileReadAnyTLI to not use any timeline that ends within the segment (attached patch). At this point the name of the function becomes really wrong, XLogFileReadCorrectTLI or something to that effect would be much more descriptive and the code could be simplified. However I'm not particularly happy with this approach as it will not use valid WAL if that is not available. Consider scenario of a cascading failure. Node A has a hard failure, then node B promotes, archives history file, but doesn't see enough traffic to archive a full segment before failing itself. While this is happening we restore node A from backup and start it up as a standby. If node b fails before node A has a chance to connect then either we are continuing recovery on the wrong timeline (current behavior) or we will not try to recover the first portion of the archived WAL file (with patch). So I think the correct approach would still be to have ReadRecord() or ApplyWalRecord() determine that switching timelines is needed. -- Ants Aasma www.cybertec-postgresql.com
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index cb07694aea6..73bde98b920 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -4171,6 +4171,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) { TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell); TimeLineID tli = hent->tli; + XLogSegNo beginseg = 0; if (tli < curFileTLI) break; /* don't bother looking at too-old TLIs */ @@ -4181,7 +4182,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) */ if (hent->begin != InvalidXLogRecPtr) { - XLogSegNo beginseg = 0; XLByteToSeg(hent->begin, beginseg, wal_segment_size); @@ -4223,6 +4223,14 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) return fd; } } + + /* + * For segments containing known timeline switches only consider the + * last timeline as redo otherwise doesn't know when to switch + * timelines. + */ + if (segno == beginseg && beginseg > 0) + break; } /* Couldn't find it. For simplicity, complain about front timeline */