On 7/19/24 21:52, Robert Haas wrote:
On Mon, Jul 15, 2024 at 11:27 AM Laurenz Albe <laurenz.a...@cybertec.at> wrote:
On Sat, 2024-06-29 at 07:01 +0200, Laurenz Albe wrote:
I played around with incremental backup yesterday and tried $subject

The WAL summarizer is running on the standby server, but when I try
to take an incremental backup, I get an error that I understand to mean
that WAL summarizing hasn't caught up yet.

I am not sure if that is working as designed, but if it is, I think it
should be documented.

I played with this some more.  Here is the exact error message:

ERROR:  manifest requires WAL from final timeline 1 ending at 0/1967C260, but 
this backup starts at 0/1967C190

By trial and error I found that when I run a CHECKPOINT on the primary,
taking an incremental backup on the standby works.

I couldn't fathom the cause of that, but I think that that should either
be addressed or documented before v17 comes out.

I had a feeling this was going to be confusing. I'm not sure what to
do about it, but I'm open to suggestions.

Suppose you take a full backup F; replay of that backup will begin
with a checkpoint CF. Then you try to take an incremental backup I;
replay will begin from a checkpoint CI. For the incremental backup to
be valid, it must include all blocks modified after CF and before CI.
But when the backup is taken on a standby, no new checkpoint is
possible. Hence, CI will be the most recent restartpoint on the
standby that has occurred before the backup starts. So, if F is taken
on the primary and then I is immediately taken on the standby without
the standby having done a new restartpoint, or if both F and I are
taken on the standby and no restartpoint intervenes, then CF=CI. In
that scenario, an incremental backup is pretty much pointless: every
single incremental file would contain 0 blocks. You might as well just
use the backup you already have, unless one of the non-relation files
has changed. So, except in that unusual corner case, the fact that the
backup fails isn't really costing you anything. In fact, there's a
decent chance that it's saving you from taking a completely useless
backup.

<snip>

I think I'm a little too close to this to really know what the best
thing to do is, so I'm happy to hear suggestions from you and others.

I think it would be enough just to add a hint such as:

HINT: this is possible when making a standby backup with little or no activity.

My guess is in production environments this will be uncommon.

For example, over the years we (pgBackRest) have gotten numerous bug reports that time-targeted PITR does not work. In every case we found that the user was just testing procedures and the database had no activity between backups -- therefore recovery had no commit timestamps to use to end recovery. Test environments sometimes produce weird results.

Having said that, I think it would be better if it worked even if it does produce an empty backup. An empty backup wastes some disk space but if it produces less friction and saves an admin having to intervene then it is probably worth it. I don't immediately see how to do that in a reliable way, though, and in any case it seems like something to consider for PG18.

Regards,
-David


Reply via email to