On Mon, Oct 16, 2023 at 10:26 AM Laurenz Albe <laurenz.a...@cybertec.at> wrote:
> On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote: > > This email is a first pass at a user-visible design for how our backup > and restore > > process, as enabled by the Low Level API, can be modified to make it > more mistake-proof. > > In short, it requires pg_start_backup to further expand upon what it > means for the > > system to be in the midst of a backup, pg_stop_backup to reverse those > things, > > and modifying the startup process to deal with the server having crashed > while the > > system is in that backup state. Notes at the end extend the design to > handle concurrent backups. > > > > The core functional changes are: > > 1) pg_backup_start modifies a newly added "in backup" state flag in > pg_control to on. > > 2) pg_backup_stop modifies that flag back to off. > > 3) postmaster will refuse to start if that flag is on, unless one of: > > a) crash.signal exists in the data directory > > b) recovery.signal exists in the data directory > > c) standby.signal exists in the data directory > > 4) Signal file processing causes the in-backup flag in pg_control to be > set to off > > > > The newly added crash.signal file is required to handle the case where > the server > > crashes after pg_backup_start and before pg_backup_stop. It initiates a > crash recovery > > of the instance just as is done today but with the added change of > flipping the flag > > to off when recovery is complete just before going live. > > I see a couple of problems and/or things that need clarification with that > idea: > > - Two backups can run concurrently. How do you reconcile that with the > "in backup" > flag and crash.signal? > - I guess crash.signal is created during pg_start_backup(). So that file > will be > included in the backup. How do you handle that during recovery? Ignore > it if > another signal file is present? And if the user forgets to create a > signal file > for recovery, how do you prevent PostgreSQL from performing crash > recovery? > > crash.signal is created in the pg_backup_metadata directory, not the root directory. Should the server crash while any backup is in progress pg_control would be aware of that fact (in_backup=true would still be there, instead of in_backup=false which only comes back after all backups have completed) and the server will not restart without user intervention - specifically, moving the crash.signal file from (one of) the pg_backup_metadata subdirectories to the root directory. As there is nothing special about the crash.signal files in the pg_backup_metadata subdirectories "touch crash.signal" could be used. The backed up pg_control file will have in_backup=true (I haven't pondered the torn reads dynamic of this - I'm supposing that placing a copy of pg_control into the pg_backup_metadata directory might be part of solving that problem). David J.