On Fri, Apr 5, 2019 at 1:05 PM Michael Paquier <mich...@paquier.xyz> wrote:

> On Fri, Apr 05, 2019 at 10:39:26AM +0200, Michael Banck wrote:
> > Ok, so the problem is that that checkpoint might be still ongoing when
> > you quickly issue a pg_rewind from the other side?
>
> The end-of-recovery checkpoint may not have even begun.
>

So can we *detect* that this is the case? Because if so, we could perhaps
just wait for it to be done? Because there will always be one?

The main point is -- we know from experience that it's pretty fragile to
assume the user read the documentation :) So if we can find *any* way to
handle this in code rather than docs, that'd be great. We would still
absolutely want the docs change for back branches of course.


> I think it might be useful to specify more exactly which of the two
> > servers (the remote one AIUI) needs a CHECKPOINT in the above. Also, if
> > it is the case that a CHECKPOINT is done automatically (see above), that
> > paragraph could be rewritten to say something like "pg_rewind needs to
> > wait for the checkoint on the remote server to finish. This can be
> > ensured by issueing an explicit checkpoint on the remote server prior to
> > running pg_rewind."
>
> Well, the target server needs to be cleanly shut down, so it seems
> pretty clear to me which one needs to have a checkpoint :)
>

Clear to you and us of course, but quite possibly not to everybody. I'm
sure there are a *lot* of users out there who do not realize that "cleanly
shut down" means "ran a checkpoint just before it shut down".


> Finally, (and still, if I got the above correctly), to the suggestion of
> > Magnus of pg_rewind running the checkpoint itself on the remote: would
> > that again mean that pg_rewind needs SUPERUSER rights or is there
> > a(nother) GRANTable function that could be added to the list in this
> > case?
>
> pg_rewind would require again a superuser.  So this could be
>

Ugh, you are right of course.



> optional.  In one HA workflow I maintain, what I actually do is to
> enforce directly a checkpoint immediately after the promotion is done
> to make sure that the data is up-to-date, and I don't meddle with
> pg_rewind workflow.
>

Sure. And every other HA setup also has to take care of it. That's why it
would make sense to centralize it into the tool itself when it's
*mandatory* to deal with it somehow.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Reply via email to