At Wed, 19 Oct 2022 18:50:09 +0300, Ants Aasma <a...@cybertec.at> wrote in > When standby is recovering to a timeline that doesn't have any segments > archived yet it will just blindly blow past the timeline switch point and > keeps on recovering on the old timeline. Typically that will eventually > result in an error about incorrect prev-link, but under unhappy > circumstances can result in standby silently having different contents. > > Attached is a shell script that reproduces the issue. Goes back to at least > v12, probably longer. > > I think we should be keeping track of where the current replay timeline is > going to end and not read any records past it on the old timeline. Maybe > while at it, we should also track that the next record should be a > checkpoint record for the timeline switch and error out if not. Thoughts?
primary_restored did a time-travel to past a bit because of the recovery_target=immediate. In other words, the primary_restored and the replica diverge. I don't think it is legit to connect a diverged standby to a primary. So, about the behavior in doubt, it is the correct behavior to seemingly ignore the history file in the archive. Recovery assumes that the first half of the first segment of the new timeline is the same with the same segment of the old timeline (.partial) so it is legit to read the <tli=1,seg=2> file til the end and that causes the replica goes beyond the divergence point. As you know, when new primary starts a diverged history, the recommended way is to blow (or stash) away the archive, then take a new backup from the running primary. If you don't want to trash all the past backups, remove the archived files equals to or after the divergence point before starting the standby. They're <tli=2,seg=2,3> in this case. Also you must remove replica/pg_wal/<tli=2,seg=2> before starting the replica. That file causes recovery run beyond the divergence point before fetching from archive or stream. regards. -- Kyotaro Horiguchi NTT Open Source Software Center