Re: Add recovery to pg_control and remove backup_label

Robert Haas Mon, 20 Nov 2023 08:11:39 -0800

On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <mich...@paquier.xyz> wrote:
> (I am not exactly sure how, but we've lost pgsql-hackers on the way
> when you sent v5.  Now added back in CC with the two latest patches
> you've proposed attached.)
>
> Here is a short summary of what has been missed by the lists:
> - I've commented that the patch should not create, not show up in
> fields returned the SQL functions or stream control files with a size
> of 512B, just stick to 8kB.  If this is worth changing this should be
> applied consistently across the board including initdb, discussed on
> its own thread.
> - The backup-related fields in the control file are reset at the end
> of recovery.  I've suggested to not do that to keep a trace of what
> was happening during recovery.  The latest version of the patch resets
> the fields.
> - With the backup_label file gone, we lose some information in the
> backups themselves, which is not good.  Instead, you have suggested an
> approach where this data is added to the backup manifest, meaning that
> no information would be lost, particularly useful for self-contained
> backups.  The fields planned to be added to the backup manifest are:
> -- The start and end time of the backup, the end timestamp being
> useful to know when stop time can be used for PITR.
> -- The backup label.
> I've agreed that it may be the best thing to do at this end to not
> lose any data related to the removal of the backup_label file.


I think we need more votes to make a change this big. I have a
concern, which I think I've expressed before, that we keep whacking
around the backup APIs, and that has a cost which is potentially
larger than the benefits. The last time we changed the API, we changed
pg_stop_backup to pg_backup_stop, but this doesn't do that, and I
wonder if that's OK. Even if it is, do we really want to change this
API around again after such a short time?

That said, I don't have an intrinsic problem with moving this
information from the backup_label to the backup_manifest file since it
is purely informational. I do think there should perhaps be some
additions to the test cases, though.

I am concerned about the interaction of this proposal with incremental
backup. When you take an incremental backup, you get something that
looks a lot like a usable data directory but isn't. To prevent that
from causing avoidable disasters, the present version of the patch
adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the
backup_label. pg_combinebackup knows to look for those fields, and the
server knows that if they are present it should refuse to start. With
this change, though, I suppose those fields would end up in
pg_control. But that does not feel entirely great, because we have a
goal of keeping the amount of real data in pg_control below 512 bytes,
the traditional sector size, and this adds another 12 bytes of stuff
to that file that currently doesn't need to be there. I feel like
that's kind of a problem.

But my main point here is ... if we have a few more senior hackers
weigh in and vote in favor of this change, well then that's one thing.
But IMHO a discussion that's mostly between 2 people is not nearly a
strong enough consensus to justify this amount of disruption.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add recovery to pg_control and remove backup_label

Reply via email to