Re: [HACKERS] Point in Time Recovery

Bruce Momjian Wed, 14 Jul 2004 19:06:40 -0700

I talked to Tom on the phone today and and I think we have a procedure
for doing backup/restore in a fairly foolproof way.

As outlined below, we need to record the start/stop and checkpoint WAL
file names and offsets, and somehow pass those on to restore.  I think
any system that requires users to link those values together is going
to cause confusion and be error-prone.

My idea is to do much of this automatically.  First, create a
server-side function called pitr_backup_start() which creates a file in
the /data directory which contains the WAL filename/offsets for
last checkpoint and start.  Then do the backup of the data directory. 
Then call pitr_backup_stop() which adds the stop filename/offsets to the
file, and archive that file in the same place as the WAL files.

To restore, you untar the backup of /data.  Then the recover backend
reads the file created by pitr_backup_start() to find the name of the
backup parameter file.  It then recovers that file from the archive
location and uses the start/stop/checkpoint filename/offset information
to the restore.  

The advantage of this is that the tar backup contains everything needed
to find the proper parameter file for restore.  Ideally we could get all
the parameters into the tar backup, but that isn't possible because we
can't push the stop counters into the backup after the backup has
completed.

I recommend the pitr_backup_start() file be named for the current WAL
filename/offset, perhaps 000000000000032c.3da390.backup or something
like that.  The file would be a simple text file in
pg_xlog/archive_status:

        # start 2004-07-14 21:35:22.324579-04
        wal_checkpoint = 0000000000000319.021233
        wal_start = 000000000000032c.92a9cb
...added after backup completes...
        wal_stop = 000000000000034a.3db030
        # stop 2004-07-14 21:32:22.0923213-04

The timestamps are for documentation only.  These files give admins
looking in the archive directory information on backup times.

(As an idea, there is no need for the user to specify a recovery mode. 
If the postmaster starts and sees the pitr_backup_start() file in /data,
it can go into recovery mode automatically.  If the archiver can't find
the file in the archive location, it can assume that it is just being
started from power failure mode.  However if it finds the file in the
archive location, it can assume it is to enter recovery mode.  There is
a race condition that a crash during copy of the file to the archive
location would be a problem.   The solution would be to create a special
flag file before copying the file to archive, and then archive it and
remove the flag file.  If the postmaster starts up and sees the
pitr_backup_start() file in /data and in the archive location, and it
doesn't see the flag file, it then knows it is doing a restore because
the flag file would never appear in a backup.  Anyway, this is just an
idea.)

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Wed, 2004-07-14 at 10:57, Zeugswetter Andreas SB SD wrote:
> > > The recovery mechanism doesn't rely upon you knowing 1 or 3. The
> > > recovery reads pg_control (from the backup) and then attempts to
> > > de-archive the appropriate xlog segment file and then starts 
> > > rollforward
> > 
> > Unfortunately this only works if pg_control was the first file to be 
> > backed up (or by chance no checkpoint happened after backup start and 
> > pg_control backup)
> > 
> > Other db's have commands for:
> > start/end external backup
> > 
> 
> OK...this idea has come up a few times. Here's my take:
> 
> - OS and hardware facilities exist now to make instant copies of sets of
> files. Some of these are open source, others not. If you use these, you
> have no requirement for this functionality....but these alone are no
> replacement for archive recovery.... I accept that some people may not
> wish to go to the expense or effort to use those options, but in my mind
> these are the people that will not be using archive_mode anyway.
> 
> - all we would really need to do is to stop the bgwriter from doing
> anything during backup. pgcontrol is only updated at checkpoint. The
> current xlog is updated constantly, but this need not be copied because
> we are already archiving it as soon as its full. That leaves the
> bgwriter, which is now responsible for both lazy writing and
> checkpoints.
> So, put a switch into bgwriter to halt for a period, then turn it back
> on at the end. Could be a SIGHUP GUC...or...
> 
> ...and with my greatest respects....
> 
> - please could somebody else code that?... my time is limited
> 
> Best regards, Simon Riggs
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: [HACKERS] Point in Time Recovery

Reply via email to