On Thu, 20 Oct 2022 at 11:30, Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > primary_restored did a time-travel to past a bit because of the > recovery_target=immediate. In other words, the primary_restored and > the replica diverge. I don't think it is legit to connect a diverged > standby to a primary.
primary_restored did timetravel to the past, as we're doing PITR on the primary that's the expected behavior. However replica is not diverged, it's a copy of the exact same basebackup. The usecase is restoring a cluster from backup using PITR and using the same backup to create a standby. Currently this breaks when primary has not yet archived any segments. > So, about the behavior in doubt, it is the correct behavior to > seemingly ignore the history file in the archive. Recovery assumes > that the first half of the first segment of the new timeline is the > same with the same segment of the old timeline (.partial) so it is > legit to read the <tli=1,seg=2> file til the end and that causes the > replica goes beyond the divergence point. What is happening is that primary_restored has a timeline switch at tli 2, lsn 0/2000100, and the next insert record starts in the same segment. Replica is starting on the same backup on timeline 1, tries to find tli 2 seg 2, which is not archived yet, so falls back to tli 1 seg 2 and replays tli 1 seg 2 continuing to tli seg 3, then connects to primary and starts applying wal starting from tli 2 seg 4. To me that seems completely broken. > As you know, when new primary starts a diverged history, the > recommended way is to blow (or stash) away the archive, then take a > new backup from the running primary. My understanding is that backup archives are supposed to remain valid even after PITR or equivalently a lagging standby promoting. -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com