On Thu, Sep 28, 2023 at 05:14:22PM -0400, David Steele wrote: > While reading through [1] I saw there were two instances where backup_label > was removed to achieve a "successful" restore. This might work on trivial > test restores but is an invitation to (silent) disaster in a production > environment where the checkpoint stored in backup_label is almost certain to > be earlier than the one stored in pg_control.
Definitely successful. > Recovery worked perfectly as long as backup_label was present and failed > hard when it was not: > > LOG: invalid primary checkpoint record > PANIC: could not locate a valid checkpoint record > > It's not a very good message, but at least the foot gun has been removed. We > could use this as a special value to give a better message, and maybe use > something a bit more unique like 0xFFFFFFFFFADEFADE (or whatever) as the > value. Why not just InvalidXLogRecPtr? > This is all easy enough for pg_basebackup to do, but will certainly be > non-trivial for most backup software to implement. In [2] we have discussed > perhaps returning pg_control from pg_backup_stop() for the backup software > to save, or it could become part of the backup_label (encoded as hex or > base64, presumably). I prefer the latter as this means less work for the > backup software (except for the need to exclude pg_control from the backup). > > I don't have a patch for this yet because I did not test this idea using > pg_basebackup, but I'll be happy to work up a patch if there is interest. If the contents of the control file are tweaked before sending it through a BASE_BACKUP, it would cover more than just pg_basebackup. Switching the way the control file is sent with new contents in sendFileWithContent() rather than sendFile() would be one way, for instance.. -- Michael
signature.asc
Description: PGP signature