Re: Remove Deprecated Exclusive Backup Mode

Stephen Frost Tue, 26 Feb 2019 04:16:54 -0800

Greetings,

* Laurenz Albe ([email protected]) wrote:
> I think the fundamental problem with all these approaches is that there is
> no safe way to distinguish a server crashed in backup mode from a restored
> backup.  This is what makes the problem so hard.


Right- if you want to just call start/stop and take a snapshot in the
middle and then be able to restore that directly and start up the
database, then there *can't* be any way to distinguish between the two,
which is, I'm pretty sure, where this whole discussion ended up back
during the 9.6 development cycle and why it's still an issue.  If there
was an easy way to fix this, I feel like we would have already.

> The existing exclusive backup is in my opinion the safest variant: it refuses
> to create a corrupted cluster without manual intervention and gives you a dire
> warning to consider if you are doing the right thing.

... it's the least dangerous if you limit yourself to that method, but
that doesn't make it safe. :(

In the end, you basically *have* to have a way of extracting out the
data needed for the backup (start/stop WAL and such) that doesn't make
the running cluster look like it's a backup being restored, and you
*have* to make that information available to the database cluster when
it's restored somehow, and notify PG that it's doing backup recovery and
*not* crash recovery, to eliminate this risk, and that's pretty hard to
manage if all you want to do is snapshot the filesystem.

Of course, you have to have a solution for WAL too and the thought has
crossed my mind that maybe there's something we could do when it comes
to stash all the info needed in the WAL archive, but I'm still not sure
how we'd solve for knowing if we're doing backup recovery or crash
recovery in that case without some kind of marker or something external
telling us that's what we're doing.  As you proposed previously, but
with a bit of a twist, maybe we could just always do backup recovery if
we find a .backup (or whatever) file in the WAL that, when compared to
pg_control, shows that we were in the process of doing a backup...  That
would require that everyone always have a restore_command set, which
wasn't possible before because that went into recovery.conf, but it's
possible to just always have that set now, and that would eliminate the
risk of us running the system out of disk space by keeping all the WAL
that's generated during the backup local.

Obviously, a lot of this is pretty hand-wavy, and you still have the
unfortunate situation that if you're actually recoverying from a crash
that just happened to happen while you were taking a backup then you
could be replaying a heck of a lot more WAL than you needed to, and you
have to have a working restore_command on the primary, and you'd have to
figure out a way for PG to check for these files .backup or whatever
files on startup that doesn't take forever or require stepping through
every WAL segment or something, but maybe those concerns could be
addressed.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Remove Deprecated Exclusive Backup Mode

Reply via email to