On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier <mich...@paquier.xyz> wrote: > (I am not exactly sure how, but we've lost pgsql-hackers on the way > when you sent v5. Now added back in CC with the two latest patches > you've proposed attached.) > > Here is a short summary of what has been missed by the lists: > - I've commented that the patch should not create, not show up in > fields returned the SQL functions or stream control files with a size > of 512B, just stick to 8kB. If this is worth changing this should be > applied consistently across the board including initdb, discussed on > its own thread. > - The backup-related fields in the control file are reset at the end > of recovery. I've suggested to not do that to keep a trace of what > was happening during recovery. The latest version of the patch resets > the fields. > - With the backup_label file gone, we lose some information in the > backups themselves, which is not good. Instead, you have suggested an > approach where this data is added to the backup manifest, meaning that > no information would be lost, particularly useful for self-contained > backups. The fields planned to be added to the backup manifest are: > -- The start and end time of the backup, the end timestamp being > useful to know when stop time can be used for PITR. > -- The backup label. > I've agreed that it may be the best thing to do at this end to not > lose any data related to the removal of the backup_label file.
I think we need more votes to make a change this big. I have a concern, which I think I've expressed before, that we keep whacking around the backup APIs, and that has a cost which is potentially larger than the benefits. The last time we changed the API, we changed pg_stop_backup to pg_backup_stop, but this doesn't do that, and I wonder if that's OK. Even if it is, do we really want to change this API around again after such a short time? That said, I don't have an intrinsic problem with moving this information from the backup_label to the backup_manifest file since it is purely informational. I do think there should perhaps be some additions to the test cases, though. I am concerned about the interaction of this proposal with incremental backup. When you take an incremental backup, you get something that looks a lot like a usable data directory but isn't. To prevent that from causing avoidable disasters, the present version of the patch adds INCREMENTAL FROM LSN and INCREMENTAL FROM TLI fields to the backup_label. pg_combinebackup knows to look for those fields, and the server knows that if they are present it should refuse to start. With this change, though, I suppose those fields would end up in pg_control. But that does not feel entirely great, because we have a goal of keeping the amount of real data in pg_control below 512 bytes, the traditional sector size, and this adds another 12 bytes of stuff to that file that currently doesn't need to be there. I feel like that's kind of a problem. But my main point here is ... if we have a few more senior hackers weigh in and vote in favor of this change, well then that's one thing. But IMHO a discussion that's mostly between 2 people is not nearly a strong enough consensus to justify this amount of disruption. -- Robert Haas EDB: http://www.enterprisedb.com