On Tue, Oct 22, 2013 at 1:10 PM, Shaun Thomas <stho...@optionshouse.com>wrote:

> > So you can grab the extra files, but you can't make it apply them,
> > as you are telling it that it doesn't need to.
>
> Do I have to, though? Replaying transaction logs is baked into the crash
> recovery system. If I interrupt it in the middle of a checkpoint, it should
> be able to revert to the previous checkpoint that did succeed.


True, but it needs to know that it needs to do that.


> By including the extra WAL files, it would re-apply them, just like in a
> crash recovery.
>
> Of course, that only works if I interrupt it by shutting the replica down.
> By backing up across a checkpoint, I run the risk of a race condition where
> some files were backed up before the checkpoint, and others afterwards.
> Which raises the question: isn't that risk the same with a regular backup?
> The database doesn't just stop checkpointing because a backup is in
> progress.


The backup_label file records the checkpoint that occurred inside the
pg_start_backup()
call and is not updated with subsequent checkpoints.  It acts as an
alternative control file, forcing recovery to start out at that checkpoint
rather than some later one which was completed and recorded into the real
control file while the backup was underway.

This is one of the advantages of pg_basebackup: since it injects
backup_label directly into the backup (where it is needed) without creating
it on the master (where it is not needed, other than as a way to make sure
it ends up in the backup), it means that if the master crashes during a
backup, with pg_basebackup it will start recovery from the last eligible
checkpoint, rather than starting from the pg_start_backup() checkpoint.
 Not only does using the earlier checkpoint cause extra work, it also runs
the risk that some of the WAL needed to start from the earlier checkpoint
have already been recycled, so it refuses to start until someone manually
intervenes by deleting the backup_label file.

There must be some internal detail I'm missing.
>
> Either way, I'll add a routine to stall the standby backup until the
> restartpoint corresponding to the pg_start_backup has been replayed. I'll
> see if that helps.
>

A possible alternative would be to fake a backup_label file which contains
the pointer to the restartpoint that was known-good at the time the master
was put into backup mode.  If you have full_page_writes off, that would be
a problem.  There may be other problems with it that I'm unaware of, and it
seems like running with scissors.

Cheers,

Jeff

Reply via email to