On Fri, Apr 5, 2019 at 1:05 PM Michael Paquier <mich...@paquier.xyz> wrote:
> On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote: > > Ok, so the problem is that that checkpoint might be still ongoing when > > you quickly issue a pg_rewind from the other side? > > The end-of-recovery checkpoint may not have even begun. > So can we *detect* that this is the case? Because if so, we could perhaps just wait for it to be done? Because there will always be one? The main point is -- we know from experience that it's pretty fragile to assume the user read the documentation :) So if we can find *any* way to handle this in code rather than docs, that'd be great. We would still absolutely want the docs change for back branches of course. > I think it might be useful to specify more exactly which of the two > > servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if > > it is the case that a CHECKPOINT is done automatically (see above), that > > paragraph could be rewritten to say something like "pg_rewind needs to > > wait for the checkoint on the remote server to finish. This can be > > ensured by issueing an explicit checkpoint on the remote server prior to > > running pg_rewind." > > Well, the target server needs to be cleanly shut down, so it seems > pretty clear to me which one needs to have a checkpoint :) > Clear to you and us of course, but quite possibly not to everybody. I'm sure there are a *lot* of users out there who do not realize that "cleanly shut down" means "ran a checkpoint just before it shut down". > Finally, (and still, if I got the above correctly), to the suggestion of > > Magnus of pg_rewind running the checkpoint itself on the remote: would > > that again mean that pg_rewind needs SUPERUSER rights or is there > > a(nother) GRANTable function that could be added to the list in this > > case? > > pg_rewind would require again a superuser. So this could be > Ugh, you are right of course. > optional. In one HA workflow I maintain, what I actually do is to > enforce directly a checkpoint immediately after the promotion is done > to make sure that the data is up-to-date, and I don't meddle with > pg_rewind workflow. > Sure. And every other HA setup also has to take care of it. That's why it would make sense to centralize it into the tool itself when it's *mandatory* to deal with it somehow. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>