Re: pg_rewind race condition just after promotion

Kyotaro Horiguchi Mon, 07 Dec 2020 20:45:53 -0800

At Mon, 7 Dec 2020 20:13:25 +0200, Heikki Linnakangas <hlinn...@iki.fi> wrote 
in 
> There's a race condition between the checkpoint at promotion and
> pg_rewind. When a server is promoted, the startup process writes an
> end-of-recovery checkpoint that includes the new TLI, and the server
> is immediate opened for business. The startup process requests the
> checkpointer process to perform a checkpoint, but it can take a few
> seconds or more to complete. If you run pg_rewind, using the just
> promoted server as the source, pg_rewind will think that the server is
> still on the old timeline, because it only looks at TLI in the control
> file's copy of the checkpoint record. That's not updated until the
> checkpoint is finished.
> 
> This isn't a new issue. Stephen Frost first reported it back 2015
> [1]. Back then, it was deemed just a small annoyance, and we just
> worked around it in the tests by issuing a checkpoint command after
> promotion, to wait for the checkpoint to finish. I just ran into it
> again today, with the new pg_rewind test, and silenced it in the
> similar way.


I (or we) faced that and avoided that by checking for history file on
the primary.

> I think we should fix this properly. I'm not sure if it can lead to a
> broken cluster, but at least it can cause pg_rewind to fail
> unnecessarily and in a user-unfriendly way. But this is actually
> pretty simple to fix. pg_rewind looks at the control file to find out
> the timeline the server is on. When promotion happens, the startup
> process updates minRecoveryPoint and minRecoveryPointTLI fields in the
> control file. We just need to read it from there. Patch attached.

Looks fine to me.  A bit concerned about making sourceHistory
needlessly file-local but on the other hand unifying sourceHistory and
targetHistory looks better.

For the test part, that change doesn't necessariry catch the failure
of the current version, but I *believe* the prevous code is the result
of an actual failure in the past so the test probablistically (or
dependently on platforms?) hits the failure if it happned.

> I think we should also backpatch this. Back in 2015, we decided that
> we can live with this, but it's always been a bit bogus, and seems
> simple enough to fix.

I don't think this changes any successful behavior and it just saves
the failure case so +1 for back-patching.

> Thoughts?
> 
> [1]
> https://www.postgresql.org/message-id/20150428180253.GU30322%40tamriel.snowman.net

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_rewind race condition just after promotion

Reply via email to