On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > Thank you for looking and retelling this. > > At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas <robertmh...@gmail.com> wrote > in <ca+tgmoardyfcqcg9x-e99xcxdc4lw0zygp5qlsphfsb6byd...@mail.gmail.com> >> On Sun, Oct 2, 2016 at 8:52 AM, Michael Paquier >> <michael.paqu...@gmail.com> wrote: >> >> So, if I understand correctly, then we can mark the version posted by >> >> you upthread [1] which includes a test along with Kyotaro's fix can be >> >> marked as Ready for committer. If so, then please change the status >> >> of patch accordingly. >> > >> > Patch moved to next CF 2016-11, still with status "ready for committer". >> > >> > IMO, this thread has reached as conclusion to use this patch to >> > address the problem reported: >> > cab7npqtv5gmkqcndofgtgqoqxz2xlz4rrw247oqojzztvy6...@mail.gmail.com >> >> I spent some time reading this thread today and trying to understand >> exactly what's going on here. Here's my attempt to restate the >> problem: >> >> 1. If a restartpoint flushes no dirty buffers, then the redo location >> advances but the minimum recovery point does not; therefore, the >> minimum recovery point can fall behind the redo location. That's >> correct behavior, because if the standby is subsequently restarted, it >> can correctly begin at the checkpoint record associated with the >> restartpoint and need not replay any further. > > Yes. > >> 2. However, it causes a problem when a base backup with the "include >> WAL" option is taken from a standby because -- on standby servers only >> -- the end location for a backup as returned by do_pg_stop_backup() is >> the minimum recovery point. If this precedes the start point for the >> backup -- which is the restart point -- perform_base_backup() will >> conclude that no WAL files are required for the backup and fail an >> internal sanity check. > > Yes. The option implies that the copied $PGDATA is a standalone > backup, that is, usable to start a standby instantly so at least > one WAL segment is needed, as mentioned in 3 below. > >> 3. Removing the sanity check wouldn't help, because it can never be >> right to end up with zero WAL files as a result of a base backup. At >> a minimum, we need the WAL file which contains the checkpoint record >> from which the control file specifies that redo should start. >> Actually, I think that checkpoint record could be spread across >> multiple files: it might be big. We need them all, but the current >> code won't ensure that we get them. > > Yes. The "checkpoit records" means that records during a checkpoint. > >> The consensus solution on this thread seems to be that we should have >> pg_do_stop_backup() return the last-replayed XLOG location as the >> backup end point. If the control file has been updated with a newer >> redo location, then the associated checkpoint record has surely been >> replayed, so we'll definitely have enough WAL to replay that >> checkpoint record (and we don't need to replay anything else, because >> we're supposing that this is the case where the minimum recovery point >> precedes the redo location). Although this will work, it might >> include more WAL in the backup than is strictly necessary. If the >> additional WAL replayed beyond the minimum recovery point does NOT >> include a checkpoint, we don't need any of it; if it does, we need >> only the portion up to and including the last checkpoint record, and >> not anything beyond that. > > StartupXLOG mandates the existence of the start record of the > latest checkpoint. pg_start_backup() returns the start point > (redo location) of the restartpoint that it requested and > minRecoveryPoint is always after the replpayed checkpoint so it > works if always was going well. > > But if the checkpoint contains no update, the restart point > exceeds minRecoveryPoint. Then no WAL copied. > >> I can think of two solutions that would be "tighter": >> >> 1. When performing a restartpoint, update the minimum recovery point >> to just beyond the checkpoint record. I think this can't hurt anyone >> who is actually restarting recovery on the same machine, because >> that's exactly the point where they are going to start recovery. A >> minimum recovery point which precedes the point at which they will >> start recovery is no better than one which is equal to the point at >> which they will start recovery. > > This is a candidate that I thought of. But I avoided to change > the behavior of minRecoveryPoint that (seems to me that) it > describes only buffer state. So checkpoint with no update doesn't > advances minRecoveryPoint as my understanding. >
I think what you are saying is not completely right, because we do update minRecoveryPoint when we don't perform a new restart point. When we perform restart point, then it assumes that flushing the buffers will take care of updating minRecoveryPoint. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers