On Mon, Sep 11, 2023 at 07:04:30PM +0300, Aleksander Alekseev wrote: > Many thanks for submitting the patch. I added it to the nearest open > commitfest [1]. > > IMO a test is needed that makes sure no one is going to break this in > the future.
You definitely need more complex test scenarios for that. If you can come up with new ways to make the TAP tests of pg_rewind mode modular in handling more complicated node setups, that would be a nice addition, for example. > [1]: https://commitfest.postgresql.org/45/4559/ @@ -7951,6 +7951,15 @@ xlog_redo(XLogReaderState *record) ereport(PANIC, (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record", xlrec.ThisTimeLineID, replayTLI))); + /* + * Update the control file so that crash recovery can follow the timeline + * changes to this point. + */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + ControlFile->minRecoveryPoint = lsn; + ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID; This patch is at least incorrect in its handling of crash recovery, because these two should *never* be set in this case as we want to replay up to the end of WAL. For example, see xlog.c or the top of xlogrecovery.c about the assumptions behind these variables: /* crash recovery should always recover to the end of WAL */ ControlFile->minRecoveryPoint = InvalidXLogRecPtr; ControlFile->minRecoveryPointTLI = 0; If an end-of-recovery record is replayed during crash recovery, these assumptions are plain broken. One thing that we could consider is to be more aggressive with restartpoints when replaying this record for a standby, see a few lines above the lines added by your patch, for example. And we could potentially emulate a post-promotion restart point to get a refresh of the control file as it should, with the correct code paths involved in the updates of minRecoveryPoint when the checkpointer does the job. -- Michael
signature.asc
Description: PGP signature