At Mon, 7 Dec 2020 20:13:25 +0200, Heikki Linnakangas <hlinn...@iki.fi> wrote in > There's a race condition between the checkpoint at promotion and > pg_rewind. When a server is promoted, the startup process writes an > end-of-recovery checkpoint that includes the new TLI, and the server > is immediate opened for business. The startup process requests the > checkpointer process to perform a checkpoint, but it can take a few > seconds or more to complete. If you run pg_rewind, using the just > promoted server as the source, pg_rewind will think that the server is > still on the old timeline, because it only looks at TLI in the control > file's copy of the checkpoint record. That's not updated until the > checkpoint is finished. > > This isn't a new issue. Stephen Frost first reported it back 2015 > [1]. Back then, it was deemed just a small annoyance, and we just > worked around it in the tests by issuing a checkpoint command after > promotion, to wait for the checkpoint to finish. I just ran into it > again today, with the new pg_rewind test, and silenced it in the > similar way.
I (or we) faced that and avoided that by checking for history file on the primary. > I think we should fix this properly. I'm not sure if it can lead to a > broken cluster, but at least it can cause pg_rewind to fail > unnecessarily and in a user-unfriendly way. But this is actually > pretty simple to fix. pg_rewind looks at the control file to find out > the timeline the server is on. When promotion happens, the startup > process updates minRecoveryPoint and minRecoveryPointTLI fields in the > control file. We just need to read it from there. Patch attached. Looks fine to me. A bit concerned about making sourceHistory needlessly file-local but on the other hand unifying sourceHistory and targetHistory looks better. For the test part, that change doesn't necessariry catch the failure of the current version, but I *believe* the prevous code is the result of an actual failure in the past so the test probablistically (or dependently on platforms?) hits the failure if it happned. > I think we should also backpatch this. Back in 2015, we decided that > we can live with this, but it's always been a bit bogus, and seems > simple enough to fix. I don't think this changes any successful behavior and it just saves the failure case so +1 for back-patching. > Thoughts? > > [1] > https://www.postgresql.org/message-id/20150428180253.GU30322%40tamriel.snowman.net regards. -- Kyotaro Horiguchi NTT Open Source Software Center