On Wed, 2021-06-16 at 16:17 -0700, Andres Freund wrote: > I think we should explicitly compute the current timeline before > using > ThisTimelineID. E.g. in StartReplication() call a new version of > GetFlushRecPtr() that also returns the current timeline id.
I think all we need to do is follow the pattern in IdentifySystem() by calling: am_cascading_walsender = RecoveryInProgress(); first. There are three cases: 1. If the server was a primary the last time RecoveryInProgress() was called, and it's still a primary, then it returns false immediately. ThisTimeLineID should be set properly already. 2. If the server was a secondary the last time RecoveryInProgress() was called, and now it's a primary, then it updates ThisTimeLineID in InitXLOGAccess() and returns false. 3. If the server was a secondary the last time, and is still a secondary, then it returns true. Then, StartReplication() will call GetStandbyFlushRecPtr(), which will update ThisTimeLineID. It was confusing to me for a while because I was trying to sort out whether am_cascading_walsender and/or ThisTimeLineID could be out of date, and how that would result in not updating ThisTimeLineID; and also why there was a difference between IdentifySystem() and StartReplication(). Simple patch attached. I didn't test it yet, but wanted to post my analysis. Regards, Jeff Davis
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 32245363561..17cb21220e0 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -573,11 +573,6 @@ StartReplication(StartReplicationCmd *cmd) StringInfoData buf; XLogRecPtr FlushPtr; - if (ThisTimeLineID == 0) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("IDENTIFY_SYSTEM has not been run before START_REPLICATION"))); - /* create xlogreader for physical replication */ xlogreader = XLogReaderAllocate(wal_segment_size, NULL, @@ -619,6 +614,7 @@ StartReplication(StartReplicationCmd *cmd) * that. Otherwise use the timeline of the last replayed record, which is * kept in ThisTimeLineID. */ + am_cascading_walsender = RecoveryInProgress(); if (am_cascading_walsender) { /* this also updates ThisTimeLineID */