Hi, On 2023-12-20 13:11:37 -0400, David Steele wrote: > I've run this through a bunch of scenarios (in my head) with parallel > backups and it does seem to hold up. > > I think we'd need to write the state file before XLOG_BACKUP_START just in > case. Seems better to have an extra state file rather than have one be > missing.
That'd very significantly weaken the approach, afaict, because "external" base base backup could end up copying those files. The whole point is to detect broken procedures, so relying on such files being excluded from the base backup seems like a bad idea. I also see no need to do so - because we'd only verify that a backup start has been replayed when replaying XLOG_BACKUP_STOP there's no danger in not creating the files during XLOG_BACKUP_START, but doing so just before logging the XLOG_BACKUP_STOP. > I'm a little worried about what happens if a state file goes missing, but I > guess that could be true of any file in PGDATA. Yea, that seems like a non-issue to me. > Probably we'd want to exclude *all* state files from backups, though. I don't think so - I think we want the opposite? As noted above, I think in a safety net like this we shouldn't assume that backup procedures were followed correctly. > Seems like in various PITR scenarios it could be hard to determine when to > remove them. Why? I think we can basically remove the files when: a) after the checkpoint during which XLOG_BACKUP_STOP was replayed - I think we already have the infrastructure to queue file deletions that we can hook into b) when replaying a shutdown checkpoint / after creation of a shutdown checkpoint Greetings, Andres Freund