Hackers,
This was originally proposed in [1] but that thread went through a
number of different proposals so it seems better to start anew.
The basic idea here is to simplify and harden recovery by getting rid of
backup_label and storing recovery information directly in pg_control.
Instead of backup software copying pg_control from PGDATA, it stores an
updated version that is returned from pg_backup_stop(). I believe this
is better for the following reasons:
* The user can no longer remove backup_label and get what looks like a
successful restore (while almost certainly causing corruption). If
pg_control is removed the cluster will not start. The user may try
pg_resetwal, but I think that tool makes it pretty clear that corruption
will result from its use. We could also modify pg_resetwal to complain
if recovery info is present in pg_control.
* We don't need to worry about backup software seeing a torn copy of
pg_control, since Postgres can safely read it out of memory and provide
a valid copy via pg_backup_stop(). This solves [2] without needing to
write pg_control via a temp file, which may affect performance on a
standby. Unfortunately, this solution cannot be back patched.
* For backup from standby, we no longer need to instruct the backup
software to copy pg_control last. In fact the backup software should not
copy pg_control from PGDATA at all.
Since backup_label is now gone, the fields that used to be in
backup_label are now provided as columns returned from pg_backup_start()
and pg_backup_stop() and the backup history file is still written to the
archive. For pg_basebackup we would have the option of writing the
fields into the JSON manifest, storing them to a file (e.g.
backup.info), or just ignoring them. None of the fields are required for
recovery but backup software may be very interested in them.
I updated pg_rewind but I'm not very confident in the tests. When I
removed backup_label processing, but before I updated pg_rewind to write
recovery info into pg_control, all the rewind tests passed.
This patch highlights the fact that we still have no tests for the
low-level backup method. I modified pgBackRest to work with this patch
and the entire test suite ran without any issues, but in-core tests
would be good to have. I'm planning to work on those myself as a
separate patch.
This patch would also make the proposal in [3] obsolete since there is
no need to rename backup_label if it is gone.
I know that outputting pg_control as bytea is going to be a bit
controversial. Software that is using psql get run pg_backup_stop()
could use encode() to get pg_control as text and then decode it later.
Alternately, we could update ReadControlFile() to recognize a
base64-encoded pg_control file. I'm not sure dealing with binary data is
that much of a problem, though, and if the backup software gets it wrong
then recovery with fail on an invalid pg_control file.
Lastly, I think there are improvements to be made in recovery that go
beyond this patch. I originally set out to load the recovery info into
*just* the existing fields in pg_control but it required so many changes
to recovery that I decided it was too dangerous to do all in one patch.
This patch very much takes the "backup_label in pg_control" approach,
though I reused fields where possible. The added fields, e.g.
backupRecoveryRequested, also allow us to keep the user experience
pretty much the same in terms of messages and errors.
Thoughts?
Regards,
-David
[1]
https://postgresql.org/message-id/1330cb48-4e47-03ca-f2fb-b144b49514d8%40pgmasters.net
[2]
https://postgresql.org/message-id/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
[3]
https://postgresql.org/message-id/eb3d1aae-1a75-bcd3-692a-38729423168f%40pgmasters.netdiff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 8cb24d6ae54..6be8fb902c5 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -935,19 +935,20 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
ready to archive.
</para>
<para>
- <function>pg_backup_stop</function> will return one row with three
- values. The second of these fields should be written to a file named
- <filename>backup_label</filename> in the root directory of the backup. The
- third field should be written to a file named
- <filename>tablespace_map</filename> unless the field is empty. These
files are
+ <function>pg_backup_stop</function> returns the
+ <filename>pg_control</filename> file, which must be stored in the
+ <filename>global</filename> directory of the backup. It also returns the
+ <filename>tablespace_map</filename> file, which should be written in the
+ root directory of the backup unless the field is empty. These files are
vital to the backup working and must be written byte for byte without
- modification, which may require opening the file in binary mode.
+ modification, which will require opening the file in binary mode.
</para>
</listitem>
<listitem>
<para>
Once the WAL segment files active during the backup are archived, you are
- done. The file identified by <function>pg_backup_stop</function>'s first
return
+ done. The file identified by <function>pg_backup_stop</function>'s
+ <parameter>lsn</parameter> return
value is the last segment that is required to form a complete set of
backup files. On a primary, if <varname>archive_mode</varname> is
enabled and the
<literal>wait_for_archive</literal> parameter is <literal>true</literal>,
@@ -1013,7 +1014,15 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
</para>
<para>
- You should, however, omit from the backup the files within the
+ You must exclude <filename>global/pg_control</filename> from your backup
+ and put the contents of the <parameter>pg_control_file</parameter> column
+ returned from <function>pg_backup_stop</function> in your backup at
+ <filename>global/pg_control</filename>. This file contains the information
+ required to safely recover.
+ </para>
+
+ <para>
+ You should also omit from the backup the files within the
cluster's <filename>pg_wal/</filename> subdirectory. This
slight adjustment is worthwhile because it reduces the risk
of mistakes when restoring. This is easy to arrange if
@@ -1062,12 +1071,7 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
</para>
<para>
- The backup label
- file includes the label string you gave to
<function>pg_backup_start</function>,
- as well as the time at which <function>pg_backup_start</function> was run,
and
- the name of the starting WAL file. In case of confusion it is therefore
- possible to look inside a backup file and determine exactly which
- backup session the dump file came from. The tablespace map file includes
+ The tablespace map file includes
the symbolic link names as they exist in the directory
<filename>pg_tblspc/</filename> and the full path of each symbolic link.
These files are not merely for your information; their presence and
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afef..01a2df1edcc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26735,7 +26735,10 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360
free (88 chunks); 1029560
<parameter>label</parameter> <type>text</type>
<optional>, <parameter>fast</parameter> <type>boolean</type>
</optional> )
- <returnvalue>pg_lsn</returnvalue>
+ <returnvalue>record</returnvalue>
+ ( <parameter>lsn</parameter> <type>pg_lsn</type>,
+ <parameter>timeline_id</parameter> <type>int8</type>,
+ <parameter>start</parameter> <type>timestamptz</type> )
</para>
<para>
Prepares the server to begin an on-line backup. The only required
@@ -26747,6 +26750,13 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360
free (88 chunks); 1029560
as possible. This forces an immediate checkpoint which will cause a
spike in I/O operations, slowing any concurrently executing queries.
</para>
+ <para>
+ The result columns contain information about the start of the backup
+ and can be ignored: the <parameter>lsn</parameter> column holds the
+ starting write-ahead log location, the
+ <parameter>timeline_id</parameter> column holds the starting timeline,
+ and the <parameter>stop</parameter> column holds the starting
timestamp.
+ </para>
<para>
This function is restricted to superusers by default, but other users
can be granted EXECUTE to run the function.
@@ -26762,13 +26772,15 @@ LOG: Grand total: 1651920 bytes in 201 blocks;
622360 free (88 chunks); 1029560
<optional><parameter>wait_for_archive</parameter>
<type>boolean</type>
</optional> )
<returnvalue>record</returnvalue>
- ( <parameter>lsn</parameter> <type>pg_lsn</type>,
- <parameter>labelfile</parameter> <type>text</type>,
- <parameter>spcmapfile</parameter> <type>text</type> )
+ ( <parameter>pg_control_file</parameter> <type>text</type>,
+ <parameter>tablespace_map_file</parameter> <type>text</type>,
+ <parameter>lsn</parameter> <type>pg_lsn</type>,
+ <parameter>timeline_id</parameter> <type>int8</type>,
+ <parameter>stop</parameter> <type>timestamptz</type> )
</para>
<para>
Finishes performing an on-line backup. The desired contents of the
- backup label file and the tablespace map file are returned as part of
+ pg_control file and the tablespace map file are returned as part of
the result of the function and must be written to files in the
backup area. These files must not be written to the live data
directory
(doing so will cause PostgreSQL to fail to restart in the event of a
@@ -26800,13 +26812,16 @@ LOG: Grand total: 1651920 bytes in 201 blocks;
622360 free (88 chunks); 1029560
backup.
</para>
<para>
- The result of the function is a single record.
- The <parameter>lsn</parameter> column holds the backup's ending
- write-ahead log location (which again can be ignored). The second
- column returns the contents of the backup label file, and the third
- column returns the contents of the tablespace map file. These must be
- stored as part of the backup and are required as part of the restore
- process.
+ The result of the function is a single record. The first column returns
+ the contents of the <filename>pg_control</filename> file and the
+ second column returns the contents of the
+ <filename>tablespace_map</filename> file. These must be stored as part
+ of the backup and are required as part of the restore process. The
+ remainder of the columns contain information about the end of the
backup
+ and can be ignored: the <parameter>lsn</parameter> column holds the
+ ending write-ahead log location, the <parameter>timeline_id</parameter>
+ column holds the ending timeline, and the <parameter>stop</parameter>
+ column holds the ending timestamp.
</para>
<para>
This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 40461923ea3..b6e0e15ab2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -74,6 +74,7 @@
#include "pg_trace.h"
#include "pgstat.h"
#include "port/atomics.h"
+#include "port/pg_crc32c.h"
#include "port/pg_iovec.h"
#include "postmaster/bgwriter.h"
#include "postmaster/startup.h"
@@ -5116,7 +5117,6 @@ StartupXLOG(void)
bool wasShutdown;
bool didCrash;
bool haveTblspcMap;
- bool haveBackupLabel;
XLogRecPtr EndOfLog;
TimeLineID EndOfLogTLI;
TimeLineID newTLI;
@@ -5240,13 +5240,14 @@ StartupXLOG(void)
/*
* Prepare for WAL recovery if needed.
*
- * InitWalRecovery analyzes the control file and the backup label file,
if
- * any. It updates the in-memory ControlFile buffer according to the
- * starting checkpoint, and sets InRecovery and
ArchiveRecoveryRequested.
+ * InitWalRecovery analyzes the control file and checks if backup
recovery
+ * has been requested. It updates the in-memory ControlFile buffer
+ * according to the starting checkpoint, and sets InRecovery and
+ * ArchiveRecoveryRequested.
+ *
* It also applies the tablespace map file, if any.
*/
- InitWalRecovery(ControlFile, &wasShutdown,
- &haveBackupLabel, &haveTblspcMap);
+ InitWalRecovery(ControlFile, &wasShutdown, &haveTblspcMap);
checkPoint = ControlFile->checkPointCopy;
/* initialize shared memory variables from the checkpoint record */
@@ -5389,20 +5390,6 @@ StartupXLOG(void)
*/
UpdateControlFile();
- /*
- * If there was a backup label file, it's done its job and the
info
- * has now been propagated into pg_control. We must get rid of
the
- * label file so that if we crash during recovery, we'll pick
up at
- * the latest recovery restartpoint instead of going all the
way back
- * to the backup start point. It seems prudent though to just
rename
- * the file out of the way rather than delete it completely.
- */
- if (haveBackupLabel)
- {
- unlink(BACKUP_LABEL_OLD);
- durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
FATAL);
- }
-
/*
* If there was a tablespace_map file, it's done its job and the
* symlinks have been created. We must get rid of the map file
so
@@ -5552,10 +5539,8 @@ StartupXLOG(void)
* (at which point we reset backupStartPoint to be Invalid), for
* backup-from-replica (which can't inject records into the WAL stream),
* that point is when we reach the minRecoveryPoint in pg_control (which
- * we purposefully copy last when backing up from a replica). For
- * pg_rewind (which creates a backup_label with a method of "pg_rewind")
- * or snapshot-style backups (which don't), backupEndRequired will be
set
- * to false.
+ * we purposefully copy last when backing up). For pg_rewind or
+ * snapshot-style backups, backupEndRequired will be set to false.
*
* Note: it is indeed okay to look at the local variable
* LocalMinRecoveryPoint here, even though ControlFile->minRecoveryPoint
@@ -8725,11 +8710,33 @@ do_pg_backup_stop(BackupState *state, bool
waitforarchive)
int seconds_before_warning;
int waits = 0;
bool reported_waiting = false;
+ ControlFileData *controlFileCopy = (ControlFileData
*)state->controlFile;
Assert(state != NULL);
backup_stopped_in_recovery = RecoveryInProgress();
+ /*
+ * Create a copy of control data and update it with fields required for
+ * recovery. Also recalculate the CRC.
+ */
+ memset(controlFileCopy, 0, PG_CONTROL_MAX_SAFE_SIZE);
+
+ LWLockAcquire(ControlFileLock, LW_SHARED);
+ memcpy(controlFileCopy, ControlFile, sizeof(ControlFileData));
+ LWLockRelease(ControlFileLock);
+
+ controlFileCopy->backupRecoveryRequired = true;
+ controlFileCopy->backupFromStandby = backup_stopped_in_recovery;
+ controlFileCopy->backupEndRequired = true;
+ controlFileCopy->backupCheckPoint = state->checkpointloc;
+ controlFileCopy->backupStartPoint = state->startpoint;
+ controlFileCopy->backupStartPointTLI = state->starttli;
+
+ INIT_CRC32C(controlFileCopy->crc);
+ COMP_CRC32C(controlFileCopy->crc, controlFileCopy,
offsetof(ControlFileData, crc));
+ FIN_CRC32C(controlFileCopy->crc);
+
/*
* During recovery, we don't need to check WAL level. Because, if WAL
* level is not sufficient, it's impossible to get here during recovery.
@@ -8831,11 +8838,8 @@ do_pg_backup_stop(BackupState *state, bool
waitforarchive)
"Enable
full_page_writes and run CHECKPOINT on the primary, "
"and then try an
online backup again.")));
-
- LWLockAcquire(ControlFileLock, LW_SHARED);
- state->stoppoint = ControlFile->minRecoveryPoint;
- state->stoptli = ControlFile->minRecoveryPointTLI;
- LWLockRelease(ControlFileLock);
+ state->stoppoint = controlFileCopy->minRecoveryPoint;
+ state->stoptli = controlFileCopy->minRecoveryPointTLI;
}
else
{
@@ -8877,7 +8881,7 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
histfilepath)));
/* Build and save the contents of the backup history file */
- history_file = build_backup_content(state, true);
+ history_file = build_backup_content(state);
fprintf(fp, "%s", history_file);
pfree(history_file);
diff --git a/src/backend/access/transam/xlogbackup.c
b/src/backend/access/transam/xlogbackup.c
index 21d68133ae1..b61ed02bbbe 100644
--- a/src/backend/access/transam/xlogbackup.c
+++ b/src/backend/access/transam/xlogbackup.c
@@ -18,19 +18,19 @@
#include "access/xlogbackup.h"
/*
- * Build contents for backup_label or backup history file.
- *
- * When ishistoryfile is true, it creates the contents for a backup history
- * file, otherwise it creates contents for a backup_label file.
+ * Build contents for backup history file.
*
* Returns the result generated as a palloc'd string.
*/
char *
-build_backup_content(BackupState *state, bool ishistoryfile)
+build_backup_content(BackupState *state)
{
char startstrbuf[128];
+ char stopstrfbuf[128];
char startxlogfile[MAXFNAMELEN]; /* backup start WAL file */
+ char stopxlogfile[MAXFNAMELEN]; /* backup stop WAL file
*/
XLogSegNo startsegno;
+ XLogSegNo stopsegno;
StringInfo result = makeStringInfo();
char *data;
@@ -45,16 +45,10 @@ build_backup_content(BackupState *state, bool ishistoryfile)
appendStringInfo(result, "START WAL LOCATION: %X/%X (file %s)\n",
LSN_FORMAT_ARGS(state->startpoint),
startxlogfile);
- if (ishistoryfile)
- {
- char stopxlogfile[MAXFNAMELEN]; /* backup stop
WAL file */
- XLogSegNo stopsegno;
-
- XLByteToSeg(state->stoppoint, stopsegno, wal_segment_size);
- XLogFileName(stopxlogfile, state->stoptli, stopsegno,
wal_segment_size);
- appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n",
-
LSN_FORMAT_ARGS(state->stoppoint), stopxlogfile);
- }
+ XLByteToSeg(state->stoppoint, stopsegno, wal_segment_size);
+ XLogFileName(stopxlogfile, state->stoptli, stopsegno, wal_segment_size);
+ appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n",
+
LSN_FORMAT_ARGS(state->stoppoint), stopxlogfile);
appendStringInfo(result, "CHECKPOINT LOCATION: %X/%X\n",
LSN_FORMAT_ARGS(state->checkpointloc));
@@ -65,17 +59,12 @@ build_backup_content(BackupState *state, bool ishistoryfile)
appendStringInfo(result, "LABEL: %s\n", state->name);
appendStringInfo(result, "START TIMELINE: %u\n", state->starttli);
- if (ishistoryfile)
- {
- char stopstrfbuf[128];
-
- /* Use the log timezone here, not the session timezone */
- pg_strftime(stopstrfbuf, sizeof(stopstrfbuf), "%Y-%m-%d
%H:%M:%S %Z",
- pg_localtime(&state->stoptime,
log_timezone));
+ /* Use the log timezone here, not the session timezone */
+ pg_strftime(stopstrfbuf, sizeof(stopstrfbuf), "%Y-%m-%d %H:%M:%S %Z",
+ pg_localtime(&state->stoptime, log_timezone));
- appendStringInfo(result, "STOP TIME: %s\n", stopstrfbuf);
- appendStringInfo(result, "STOP TIMELINE: %u\n", state->stoptli);
- }
+ appendStringInfo(result, "STOP TIME: %s\n", stopstrfbuf);
+ appendStringInfo(result, "STOP TIMELINE: %u\n", state->stoptli);
data = result->data;
pfree(result);
diff --git a/src/backend/access/transam/xlogfuncs.c
b/src/backend/access/transam/xlogfuncs.c
index 45a70668b1c..2388a60a5e5 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -53,7 +53,7 @@ static MemoryContext backupcontext = NULL;
* pg_backup_start: set up for taking an on-line backup dump
*
* Essentially what this does is to create the contents required for the
- * backup_label file and the tablespace map.
+ * the tablespace map.
*
* Permission checking for this function is managed through the normal
* GRANT system.
@@ -61,6 +61,10 @@ static MemoryContext backupcontext = NULL;
Datum
pg_backup_start(PG_FUNCTION_ARGS)
{
+#define PG_BACKUP_START_V2_COLS 3
+ TupleDesc tupdesc;
+ Datum values[PG_BACKUP_START_V2_COLS] = {0};
+ bool nulls[PG_BACKUP_START_V2_COLS] = {0};
text *backupid = PG_GETARG_TEXT_PP(0);
bool fast = PG_GETARG_BOOL(1);
char *backupidstr;
@@ -69,6 +73,10 @@ pg_backup_start(PG_FUNCTION_ARGS)
backupidstr = text_to_cstring(backupid);
+ /* Initialize attributes information in the tuple descriptor */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
if (status == SESSION_BACKUP_RUNNING)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
@@ -102,7 +110,12 @@ pg_backup_start(PG_FUNCTION_ARGS)
register_persistent_abort_backup_handler();
do_pg_backup_start(backupidstr, fast, NULL, backup_state,
tablespace_map);
- PG_RETURN_LSN(backup_state->startpoint);
+ values[0] = LSNGetDatum(backup_state->startpoint);
+ values[1] = Int64GetDatum(backup_state->starttli);
+ values[2] =
TimestampTzGetDatum(time_t_to_timestamptz(backup_state->starttime));
+
+ /* Returns the record as Datum */
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values,
nulls)));
}
@@ -113,14 +126,12 @@ pg_backup_start(PG_FUNCTION_ARGS)
* allows the user to choose if they want to wait for the WAL to be archived
* or if we should just return as soon as the WAL record is written.
*
- * This function stops an in-progress backup, creates backup_label contents and
- * it returns the backup stop LSN, backup_label and tablespace_map contents.
+ * This function stops an in-progress backup and returns the backup stop LSN,
+ * pg_control and tablespace_map contents.
*
- * The backup_label contains the user-supplied label string (typically this
- * would be used to tell where the backup dump will be stored), the starting
- * time, starting WAL location for the dump and so on. It is the caller's
- * responsibility to write the backup_label and tablespace_map files in the
- * data folder that will be restored from this backup.
+ * The pg_control file contains the recovery information for the backup. It is
+ * the caller's responsibility to write the pg_control and tablespace_map files
+ * in the data folder that will be restored from this backup.
*
* Permission checking for this function is managed through the normal
* GRANT system.
@@ -128,12 +139,12 @@ pg_backup_start(PG_FUNCTION_ARGS)
Datum
pg_backup_stop(PG_FUNCTION_ARGS)
{
-#define PG_BACKUP_STOP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 5
TupleDesc tupdesc;
Datum values[PG_BACKUP_STOP_V2_COLS] = {0};
bool nulls[PG_BACKUP_STOP_V2_COLS] = {0};
bool waitforarchive = PG_GETARG_BOOL(0);
- char *backup_label;
+ bytea *pg_control_bytea;
SessionBackupState status = get_backup_status();
/* Initialize attributes information in the tuple descriptor */
@@ -152,15 +163,16 @@ pg_backup_stop(PG_FUNCTION_ARGS)
/* Stop the backup */
do_pg_backup_stop(backup_state, waitforarchive);
- /* Build the contents of backup_label */
- backup_label = build_backup_content(backup_state, false);
-
- values[0] = LSNGetDatum(backup_state->stoppoint);
- values[1] = CStringGetTextDatum(backup_label);
- values[2] = CStringGetTextDatum(tablespace_map->data);
+ /* Build the contents of pg_control */
+ pg_control_bytea = (bytea *) palloc(PG_CONTROL_MAX_SAFE_SIZE +
VARHDRSZ);
+ SET_VARSIZE(pg_control_bytea, PG_CONTROL_MAX_SAFE_SIZE + VARHDRSZ);
+ memcpy(VARDATA(pg_control_bytea), backup_state->controlFile,
PG_CONTROL_MAX_SAFE_SIZE);
- /* Deallocate backup-related variables */
- pfree(backup_label);
+ values[0] = PointerGetDatum(pg_control_bytea);
+ values[1] = CStringGetTextDatum(tablespace_map->data);
+ values[2] = LSNGetDatum(backup_state->stoppoint);
+ values[3] = Int64GetDatum(backup_state->stoptli);
+ values[4] =
TimestampTzGetDatum(time_t_to_timestamptz(backup_state->stoptime));
/* Clean up the session-level state and its memory context */
backup_state = NULL;
diff --git a/src/backend/access/transam/xlogrecovery.c
b/src/backend/access/transam/xlogrecovery.c
index 49bb3fe4520..359d4c32e2b 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -6,7 +6,7 @@
* This source file contains functions controlling WAL recovery.
* InitWalRecovery() initializes the system for crash or archive recovery,
* or standby mode, depending on configuration options and the state of
- * the control file and possible backup label file. PerformWalRecovery()
+ * the control file and possible backup recovery. PerformWalRecovery()
* performs the actual WAL replay, calling the rmgr-specific redo routines.
* FinishWalRecovery() performs end-of-recovery checks and cleanup actions,
* and prepares information needed to initialize the WAL for writes. In
@@ -152,11 +152,12 @@ static bool recovery_signal_file_found = false;
/*
* CheckPointLoc is the position of the checkpoint record that determines
- * where to start the replay. It comes from the backup label file or the
- * control file.
+ * where to start the replay. It comes from the control file, either from the
+ * default location or from a backup recovery field.
*
- * RedoStartLSN is the checkpoint's REDO location, also from the backup label
- * file or the control file. In standby mode, XLOG streaming usually starts
+ * RedoStartLSN is the checkpoint's REDO location, also from the default
+ * control file location or from a backup recovery field. In standby mode,
+ * XLOG streaming usually starts
* from the position where an invalid record was found. But if we fail to
* read even the initial checkpoint record, we use the REDO location instead
* of the checkpoint location as the start position of XLOG streaming.
@@ -388,9 +389,6 @@ static void ApplyWalRecord(XLogReaderState *xlogreader,
XLogRecord *record, Time
static void EnableStandbyMode(void);
static void readRecoverySignalFile(void);
static void validateRecoveryParameters(void);
-static bool read_backup_label(XLogRecPtr *checkPointLoc,
- TimeLineID
*backupLabelTLI,
- bool
*backupEndRequired, bool *backupFromStandby);
static bool read_tablespace_map(List **tablespaces);
static void xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI);
@@ -492,8 +490,8 @@ EnableStandbyMode(void)
* Prepare the system for WAL recovery, if needed.
*
* This is called by StartupXLOG() which coordinates the server startup
- * sequence. This function analyzes the control file and the backup label
- * file, if any, and figures out whether we need to perform crash recovery or
+ * sequence. This function analyzes the control file and backup recovery
+ * info, if any, and figures out whether we need to perform crash recovery or
* archive recovery, and how far we need to replay the WAL to reach a
* consistent state.
*
@@ -510,7 +508,7 @@ EnableStandbyMode(void)
*/
void
InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
- bool *haveBackupLabel_ptr, bool
*haveTblspcMap_ptr)
+ bool *haveTblspcMap_ptr)
{
XLogPageReadPrivate *private;
struct stat st;
@@ -518,7 +516,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
XLogRecord *record;
DBState dbstate_at_startup;
bool haveTblspcMap = false;
- bool haveBackupLabel = false;
+ bool backupRecoveryRequired = false;
CheckPoint checkPoint;
bool backupFromStandby = false;
@@ -609,14 +607,30 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
replay_image_masked = (char *) palloc(BLCKSZ);
primary_image_masked = (char *) palloc(BLCKSZ);
- if (read_backup_label(&CheckPointLoc, &CheckPointTLI,
&backupEndRequired,
- &backupFromStandby))
+ if (ControlFile->backupRecoveryRequired)
{
List *tablespaces = NIL;
+ /* Initialize recovery from fields stored in pg_control */
+ CheckPointLoc = ControlFile->backupCheckPoint;
+ CheckPointTLI = ControlFile->backupStartPointTLI;
+ RedoStartLSN = ControlFile->backupStartPoint;
+ RedoStartTLI = ControlFile->backupStartPointTLI;
+ backupEndRequired = ControlFile->backupEndRequired;
+ backupFromStandby = ControlFile->backupFromStandby;
+
+ /* Clear fields used to initialize recovery */
+ ControlFile->backupCheckPoint = InvalidXLogRecPtr;
+ ControlFile->backupStartPointTLI = 0;
+ ControlFile->backupRecoveryRequired = false;
+ ControlFile->backupFromStandby = false;
+
+ /* Indicate that recovery was requested */
+ backupRecoveryRequired = true;
+
/*
- * Archive recovery was requested, and thanks to the backup
label
- * file, we know how far we need to replay to reach
consistency. Enter
+ * Archive recovery was requested, and thanks to the recovery
+ * info, we know how far we need to replay to reach
consistency. Enter
* archive recovery directly.
*/
InArchiveRecovery = true;
@@ -624,8 +638,9 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
EnableStandbyMode();
/*
- * When a backup_label file is present, we want to roll forward
from
- * the checkpoint it identifies, rather than using pg_control.
+ * When backup recovery is requested, we want to roll forward
from
+ * the checkpoint it identifies, rather than using the default
+ * checkpoint.
*/
record = ReadCheckpointRecord(xlogprefetcher, CheckPointLoc,
CheckPointTLI);
@@ -640,9 +655,8 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
/*
* Make sure that REDO location exists. This may not be
the case
- * if there was a crash during an online backup, which
left a
- * backup_label around that references a WAL segment
that's
- * already been archived.
+ * if recovery.signal is missing and the WAL has
already been
+ * archived.
*/
if (checkPoint.redo < CheckPointLoc)
{
@@ -651,20 +665,16 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
checkPoint.ThisTimeLineID))
ereport(FATAL,
(errmsg("could not find
redo location referenced by checkpoint record"),
- errhint("If you are
restoring from a backup, touch \"%s/recovery.signal\" and add required recovery
options.\n"
- "If
you are not restoring from a backup, try removing the file
\"%s/backup_label\".\n"
- "Be
careful: removing \"%s/backup_label\" will result in a corrupt cluster if
restoring from a backup.",
-
DataDir, DataDir, DataDir)));
+ errhint("If you are
restoring from a backup, touch \"%s/recovery.signal\" and add required recovery
options.\n",
+
DataDir)));
}
}
else
{
ereport(FATAL,
(errmsg("could not locate required
checkpoint record"),
- errhint("If you are restoring from a
backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
- "If you are not
restoring from a backup, try removing the file \"%s/backup_label\".\n"
- "Be careful: removing
\"%s/backup_label\" will result in a corrupt cluster if restoring from a
backup.",
- DataDir, DataDir,
DataDir)));
+ errhint("If you are restoring from a
backup, touch \"%s/recovery.signal\" and add required recovery options.\n",
+ DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
@@ -699,35 +709,32 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
/* tell the caller to delete it later */
haveTblspcMap = true;
}
-
- /* tell the caller to delete it later */
- haveBackupLabel = true;
}
else
{
/*
- * If tablespace_map file is present without backup_label file,
there
- * is no use of such file. There is no harm in retaining it,
but it
- * is better to get rid of the map file so that we don't have
any
+ * If tablespace_map file is present without backup recovery
requested,
+ * there is no use of such file. There is no harm in retaining
it, but
+ * it is better to get rid of the map file so that we don't
have any
* redundant file in data directory and it will avoid any sort
of
* confusion. It seems prudent though to just rename the file
out of
* the way rather than delete it completely, also we ignore any
error
* that occurs in rename operation as even if map file is
present
- * without backup_label file, it is harmless.
+ * without backup recovery requested, it is harmless.
*/
if (stat(TABLESPACE_MAP, &st) == 0)
{
unlink(TABLESPACE_MAP_OLD);
if (durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD,
DEBUG1) == 0)
ereport(LOG,
- (errmsg("ignoring file \"%s\"
because no file \"%s\" exists",
- TABLESPACE_MAP,
BACKUP_LABEL_FILE),
+ (errmsg("ignoring file \"%s\"
because backup recovery was not requested",
+ TABLESPACE_MAP),
errdetail("File \"%s\" was
renamed to \"%s\".",
TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
else
ereport(LOG,
- (errmsg("ignoring file \"%s\"
because no file \"%s\" exists",
- TABLESPACE_MAP,
BACKUP_LABEL_FILE),
+ (errmsg("ignoring file \"%s\"
because backup recovery was not requested",
+ TABLESPACE_MAP),
errdetail("Could not rename
file \"%s\" to \"%s\": %m.",
TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
}
@@ -932,7 +939,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
* Any other state indicates that the backup somehow became
corrupted
* and we can't sensibly continue with recovery.
*/
- if (haveBackupLabel)
+ if (backupRecoveryRequired)
{
ControlFile->backupStartPoint = checkPoint.redo;
ControlFile->backupEndRequired = backupEndRequired;
@@ -942,7 +949,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
if (dbstate_at_startup !=
DB_IN_ARCHIVE_RECOVERY &&
dbstate_at_startup !=
DB_SHUTDOWNED_IN_RECOVERY)
ereport(FATAL,
- (errmsg("backup_label
contains data inconsistent with control file"),
+ (errmsg("pg_control
contains inconsistent data for standby backup"),
errhint("This means
that the backup is corrupted and you will "
"have
to use another backup for recovery.")));
ControlFile->backupEndPoint =
ControlFile->minRecoveryPoint;
@@ -972,7 +979,6 @@ InitWalRecovery(ControlFileData *ControlFile, bool
*wasShutdown_ptr,
missingContrecPtr = InvalidXLogRecPtr;
*wasShutdown_ptr = wasShutdown;
- *haveBackupLabel_ptr = haveBackupLabel;
*haveTblspcMap_ptr = haveTblspcMap;
}
@@ -1145,154 +1151,6 @@ validateRecoveryParameters(void)
}
}
-/*
- * read_backup_label: check to see if a backup_label file is present
- *
- * If we see a backup_label during recovery, we assume that we are recovering
- * from a backup dump file, and we therefore roll forward from the checkpoint
- * identified by the label file, NOT what pg_control says. This avoids the
- * problem that pg_control might have been archived one or more checkpoints
- * later than the start of the dump, and so if we rely on it as the start
- * point, we will fail to restore a consistent database state.
- *
- * Returns true if a backup_label was found (and fills the checkpoint
- * location and TLI into *checkPointLoc and *backupLabelTLI, respectively);
- * returns false if not. If this backup_label came from a streamed backup,
- * *backupEndRequired is set to true. If this backup_label was created during
- * recovery, *backupFromStandby is set to true.
- *
- * Also sets the global variables RedoStartLSN and RedoStartTLI with the LSN
- * and TLI read from the backup file.
- */
-static bool
-read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
- bool *backupEndRequired, bool
*backupFromStandby)
-{
- char startxlogfilename[MAXFNAMELEN];
- TimeLineID tli_from_walseg,
- tli_from_file;
- FILE *lfp;
- char ch;
- char backuptype[20];
- char backupfrom[20];
- char backuplabel[MAXPGPATH];
- char backuptime[128];
- uint32 hi,
- lo;
-
- /* suppress possible uninitialized-variable warnings */
- *checkPointLoc = InvalidXLogRecPtr;
- *backupLabelTLI = 0;
- *backupEndRequired = false;
- *backupFromStandby = false;
-
- /*
- * See if label file is present
- */
- lfp = AllocateFile(BACKUP_LABEL_FILE, "r");
- if (!lfp)
- {
- if (errno != ENOENT)
- ereport(FATAL,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\":
%m",
- BACKUP_LABEL_FILE)));
- return false; /* it's not there, all is fine
*/
- }
-
- /*
- * Read and parse the START WAL LOCATION and CHECKPOINT lines (this code
- * is pretty crude, but we are not expecting any variability in the file
- * format).
- */
- if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
- &hi, &lo, &tli_from_walseg, startxlogfilename, &ch)
!= 5 || ch != '\n')
- ereport(FATAL,
-
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("invalid data in file \"%s\"",
BACKUP_LABEL_FILE)));
- RedoStartLSN = ((uint64) hi) << 32 | lo;
- RedoStartTLI = tli_from_walseg;
- if (fscanf(lfp, "CHECKPOINT LOCATION: %X/%X%c",
- &hi, &lo, &ch) != 3 || ch != '\n')
- ereport(FATAL,
-
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("invalid data in file \"%s\"",
BACKUP_LABEL_FILE)));
- *checkPointLoc = ((uint64) hi) << 32 | lo;
- *backupLabelTLI = tli_from_walseg;
-
- /*
- * BACKUP METHOD lets us know if this was a typical backup ("streamed",
- * which could mean either pg_basebackup or the pg_backup_start/stop
- * method was used) or if this label came from somewhere else (the only
- * other option today being from pg_rewind). If this was a streamed
- * backup then we know that we need to play through until we get to the
- * end of the WAL which was generated during the backup (at which point
we
- * will have reached consistency and backupEndRequired will be reset to
be
- * false).
- */
- if (fscanf(lfp, "BACKUP METHOD: %19s\n", backuptype) == 1)
- {
- if (strcmp(backuptype, "streamed") == 0)
- *backupEndRequired = true;
- }
-
- /*
- * BACKUP FROM lets us know if this was from a primary or a standby. If
- * it was from a standby, we'll double-check that the control file state
- * matches that of a standby.
- */
- if (fscanf(lfp, "BACKUP FROM: %19s\n", backupfrom) == 1)
- {
- if (strcmp(backupfrom, "standby") == 0)
- *backupFromStandby = true;
- }
-
- /*
- * Parse START TIME and LABEL. Those are not mandatory fields for
recovery
- * but checking for their presence is useful for debugging and the next
- * sanity checks. Cope also with the fact that the result buffers have a
- * pre-allocated size, hence if the backup_label file has been generated
- * with strings longer than the maximum assumed here an incorrect
parsing
- * happens. That's fine as only minor consistency checks are done
- * afterwards.
- */
- if (fscanf(lfp, "START TIME: %127[^\n]\n", backuptime) == 1)
- ereport(DEBUG1,
- (errmsg_internal("backup time %s in file
\"%s\"",
- backuptime,
BACKUP_LABEL_FILE)));
-
- if (fscanf(lfp, "LABEL: %1023[^\n]\n", backuplabel) == 1)
- ereport(DEBUG1,
- (errmsg_internal("backup label %s in file
\"%s\"",
- backuplabel,
BACKUP_LABEL_FILE)));
-
- /*
- * START TIMELINE is new as of 11. Its parsing is not mandatory, still
use
- * it as a sanity check if present.
- */
- if (fscanf(lfp, "START TIMELINE: %u\n", &tli_from_file) == 1)
- {
- if (tli_from_walseg != tli_from_file)
- ereport(FATAL,
-
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("invalid data in file \"%s\"",
BACKUP_LABEL_FILE),
- errdetail("Timeline ID parsed is %u,
but expected %u.",
- tli_from_file,
tli_from_walseg)));
-
- ereport(DEBUG1,
- (errmsg_internal("backup timeline %u in file
\"%s\"",
- tli_from_file,
BACKUP_LABEL_FILE)));
- }
-
- if (ferror(lfp) || FreeFile(lfp))
- ereport(FATAL,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m",
- BACKUP_LABEL_FILE)));
-
- return true;
-}
-
/*
* read_tablespace_map: check to see if a tablespace_map file is present
*
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index b537f462197..01d09dbdd21 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -22,6 +22,7 @@
#include "backup/basebackup.h"
#include "backup/basebackup_sink.h"
#include "backup/basebackup_target.h"
+#include "catalog/pg_control.h"
#include "commands/defrem.h"
#include "common/compression.h"
#include "common/file_perm.h"
@@ -94,7 +95,7 @@ static bool verify_page_checksum(Page page, XLogRecPtr
start_lsn,
BlockNumber
blkno,
uint16
*expected_checksum);
static void sendFileWithContent(bbsink *sink, const char *filename,
- const char
*content,
+ const char
*content, int len,
backup_manifest_info *manifest);
static int64 _tarWriteHeader(bbsink *sink, const char *filename,
const char
*linktarget, struct stat *statbuf,
@@ -192,10 +193,9 @@ static const struct exclude_list_item excludeFiles[] =
{RELCACHE_INIT_FILENAME, true},
/*
- * backup_label and tablespace_map should not exist in a running cluster
- * capable of doing an online backup, but exclude them just in case.
+ * tablespace_map should not exist in a running cluster capable of doing
+ * an online backup, but exclude it just in case.
*/
- {BACKUP_LABEL_FILE, false},
{TABLESPACE_MAP, false},
/*
@@ -325,23 +325,15 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
if (ti->path == NULL)
{
- struct stat statbuf;
bool sendtblspclinks = true;
- char *backup_label;
bbsink_begin_archive(sink, "base.tar");
- /* In the main tar, include the backup_label
first... */
- backup_label =
build_backup_content(backup_state, false);
- sendFileWithContent(sink, BACKUP_LABEL_FILE,
-
backup_label, &manifest);
- pfree(backup_label);
-
- /* Then the tablespace_map file, if required...
*/
+ /* Send the tablespace_map file, if required...
*/
if (opt->sendtblspcmapfile)
{
sendFileWithContent(sink,
TABLESPACE_MAP,
-
tablespace_map->data, &manifest);
+
tablespace_map->data, -1, &manifest);
sendtblspclinks = false;
}
@@ -349,14 +341,14 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
sendDir(sink, ".", 1, false, state.tablespaces,
sendtblspclinks, &manifest,
InvalidOid);
- /* ... and pg_control after everything else. */
- if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
- ereport(ERROR,
-
(errcode_for_file_access(),
- errmsg("could not stat
file \"%s\": %m",
-
XLOG_CONTROL_FILE)));
- sendFile(sink, XLOG_CONTROL_FILE,
XLOG_CONTROL_FILE, &statbuf,
- false, InvalidOid, InvalidOid,
&manifest);
+ /* End the backup before sending pg_control */
+ basebackup_progress_wait_wal_archive(&state);
+ do_pg_backup_stop(backup_state, !opt->nowait);
+
+ /* Send copy of pg_control containing recovery
info */
+ sendFileWithContent(sink, XLOG_CONTROL_FILE,
+ (char
*)backup_state->controlFile,
+
PG_CONTROL_MAX_SAFE_SIZE, &manifest);
}
else
{
@@ -390,9 +382,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
}
}
- basebackup_progress_wait_wal_archive(&state);
- do_pg_backup_stop(backup_state, !opt->nowait);
-
endptr = backup_state->stoppoint;
endtli = backup_state->stoptli;
@@ -601,7 +590,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
* complete segment.
*/
StatusFilePath(pathbuf, walFileName, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", -1, &manifest);
}
/*
@@ -629,7 +618,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink)
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
- sendFileWithContent(sink, pathbuf, "", &manifest);
+ sendFileWithContent(sink, pathbuf, "", -1, &manifest);
}
/* Properly terminate the tar file. */
@@ -1040,22 +1029,21 @@ SendBaseBackup(BaseBackupCmd *cmd)
*/
static void
sendFileWithContent(bbsink *sink, const char *filename, const char *content,
- backup_manifest_info *manifest)
+ int len, backup_manifest_info *manifest)
{
struct stat statbuf;
- int bytes_done = 0,
- len;
+ int bytes_done = 0;
pg_checksum_context checksum_ctx;
if (pg_checksum_init(&checksum_ctx, manifest->checksum_type) < 0)
elog(ERROR, "could not initialize checksum of file \"%s\"",
filename);
- len = strlen(content);
+ if (len < 0)
+ len = strlen(content);
/*
- * Construct a stat struct for the backup_label file we're injecting in
- * the tar.
+ * Construct a stat struct for the file we're injecting in the tar.
*/
/* Windows doesn't have the concept of uid and gid */
#ifdef WIN32
diff --git a/src/backend/catalog/system_functions.sql
b/src/backend/catalog/system_functions.sql
index 35d738d5763..24bf34b45eb 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -384,13 +384,15 @@ BEGIN ATOMIC
END;
CREATE OR REPLACE FUNCTION
- pg_backup_start(label text, fast boolean DEFAULT false)
- RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_backup_start'
+ pg_backup_start(label text, fast boolean DEFAULT false, OUT lsn pg_lsn,
+ OUT timeline_id int8, OUT start timestamptz)
+ RETURNS record STRICT VOLATILE LANGUAGE internal AS 'pg_backup_start'
PARALLEL RESTRICTED;
CREATE OR REPLACE FUNCTION pg_backup_stop (
- wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn,
- OUT labelfile text, OUT spcmapfile text)
+ wait_for_archive boolean DEFAULT true, OUT pg_control_file bytea,
+ OUT tablespace_map_file text, OUT lsn pg_lsn, OUT timeline_id int8,
+ OUT stop timestamptz)
RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop'
PARALLEL RESTRICTED;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b9f5e1266b4..c655cb03352 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -171,8 +171,8 @@ SKIP:
# Write some files to test that they are not copied.
foreach my $filename (
- qw(backup_label tablespace_map postgresql.auto.conf.tmp
- current_logfiles.tmp global/pg_internal.init.123))
+ qw(tablespace_map postgresql.auto.conf.tmp current_logfiles.tmp
+ global/pg_internal.init.123))
{
open my $file, '>>', "$pgdata/$filename";
print $file "DONOTCOPY";
@@ -261,14 +261,13 @@ foreach my $filename (@tempRelationFiles)
"base/$postgresOid/$filename not copied");
}
-# Make sure existing backup_label was ignored.
-isnt(slurp_file("$tempdir/backup/backup_label"),
- 'DONOTCOPY', 'existing backup_label not copied');
+# Make sure existing tablespace_map was ignored.
+ok(!-f "$tempdir/backup/tablespace_map", 'tablespace_map not in backup');
rmtree("$tempdir/backup");
-# Now delete the bogus backup_label file since it will interfere with startup
-unlink("$pgdata/backup_label")
- or BAIL_OUT("unable to unlink $pgdata/backup_label");
+# Now delete the bogus tablespace_map file since it will interfere with startup
+unlink("$pgdata/tablespace_map")
+ or BAIL_OUT("unable to unlink $pgdata/tablespace_map");
$node->command_ok(
[
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index ecadd69dc53..213f4e71b88 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -139,11 +139,10 @@ static const struct exclude_list_item excludeFiles[] =
{"pg_internal.init", true}, /* defined as RELCACHE_INIT_FILENAME */
/*
- * If there is a backup_label or tablespace_map file, it indicates that
a
- * recovery failed and this cluster probably can't be rewound, but
exclude
- * them anyway if they are found.
+ * If there is a tablespace_map file, it indicates that a recovery
failed
+ * and this cluster probably can't be rewound, but exclude it anyway if
it
+ * is found.
*/
- {"backup_label", false}, /* defined as BACKUP_LABEL_FILE */
{"tablespace_map", false}, /* defined as TABLESPACE_MAP */
/*
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index bfd44a284e2..f42782e2eab 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -39,9 +39,6 @@ static void perform_rewind(filemap_t *filemap, rewind_source
*source,
TimeLineID chkpttli,
XLogRecPtr chkptredo);
-static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
- XLogRecPtr
checkpointloc);
-
static void digestControlFile(ControlFileData *ControlFile,
const char *content,
size_t size);
static void getRestoreCommand(const char *argv0);
@@ -654,7 +651,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
pg_log_info("creating backup label and updating control file");
/*
- * Create a backup label file, to tell the target where to begin the WAL
+ * Get recovery fields to tell the target where to begin the WAL
* replay. Normally, from the last common checkpoint between the source
* and the target. But if the source is a standby server, it's possible
* that the last common checkpoint is *after* the standby's
restartpoint.
@@ -672,7 +669,6 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
chkpttli = ControlFile_source.checkPointCopy.ThisTimeLineID;
chkptrec = ControlFile_source.checkPoint;
}
- createBackupLabel(chkptredo, chkpttli, chkptrec);
/*
* Update control file of target, to tell the target how far it must
@@ -722,6 +718,12 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
ControlFile_new.minRecoveryPoint = endrec;
ControlFile_new.minRecoveryPointTLI = endtli;
ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
+ ControlFile_new.backupRecoveryRequired = true;
+ ControlFile_new.backupFromStandby = true;
+ ControlFile_new.backupEndRequired = false;
+ ControlFile_new.backupCheckPoint = chkptrec;
+ ControlFile_new.backupStartPoint = chkptredo;
+ ControlFile_new.backupStartPointTLI = chkpttli;
if (!dry_run)
update_controlfile(datadir_target, &ControlFile_new, do_sync);
}
@@ -729,7 +731,10 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
static void
sanityChecks(void)
{
- /* TODO Check that there's no backup_label in either cluster */
+ /*
+ * TODO Check that neither cluster has backupRecoveryRequested set in
+ * pg_control.
+ */
/* Check system_identifier match */
if (ControlFile_target.system_identifier !=
ControlFile_source.system_identifier)
@@ -951,51 +956,6 @@ findCommonAncestorTimeline(TimeLineHistoryEntry
*a_history, int a_nentries,
}
}
-
-/*
- * Create a backup_label file that forces recovery to begin at the last common
- * checkpoint.
- */
-static void
-createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr
checkpointloc)
-{
- XLogSegNo startsegno;
- time_t stamp_time;
- char strfbuf[128];
- char xlogfilename[MAXFNAMELEN];
- struct tm *tmp;
- char buf[1000];
- int len;
-
- XLByteToSeg(startpoint, startsegno, WalSegSz);
- XLogFileName(xlogfilename, starttli, startsegno, WalSegSz);
-
- /*
- * Construct backup label file
- */
- stamp_time = time(NULL);
- tmp = localtime(&stamp_time);
- strftime(strfbuf, sizeof(strfbuf), "%Y-%m-%d %H:%M:%S %Z", tmp);
-
- len = snprintf(buf, sizeof(buf),
- "START WAL LOCATION: %X/%X (file %s)\n"
- "CHECKPOINT LOCATION: %X/%X\n"
- "BACKUP METHOD: pg_rewind\n"
- "BACKUP FROM: standby\n"
- "START TIME: %s\n",
- /* omit LABEL: line */
- LSN_FORMAT_ARGS(startpoint), xlogfilename,
- LSN_FORMAT_ARGS(checkpointloc),
- strfbuf);
- if (len >= sizeof(buf))
- pg_fatal("backup label buffer too small"); /* shouldn't
happen */
-
- /* TODO: move old file out of the way, if any. */
- open_target_file("backup_label", true); /* BACKUP_LABEL_FILE */
- write_target_range(buf, 0, len);
- close_target_file();
-}
-
/*
* Check CRC of control file
*/
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index a14126d164f..3aac6839a70 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -293,8 +293,6 @@ extern SessionBackupState get_backup_status(void);
/* File path names (all relative to $PGDATA) */
#define RECOVERY_SIGNAL_FILE "recovery.signal"
#define STANDBY_SIGNAL_FILE "standby.signal"
-#define BACKUP_LABEL_FILE "backup_label"
-#define BACKUP_LABEL_OLD "backup_label.old"
#define TABLESPACE_MAP "tablespace_map"
#define TABLESPACE_MAP_OLD "tablespace_map.old"
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index 1611358137b..b75411b7c3d 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -15,6 +15,7 @@
#define XLOG_BACKUP_H
#include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
#include "pgtime.h"
/* Structure to hold backup state. */
@@ -33,9 +34,18 @@ typedef struct BackupState
XLogRecPtr stoppoint; /* backup stop WAL location */
TimeLineID stoptli; /* backup stop TLI */
pg_time_t stoptime; /* backup stop time */
+
+ /*
+ * After pg_backup_stop() returns this field will contain a copy of
+ * pg_control that should be stored with the backup. Fields have been
+ * updated for recovery and the CRC has been recalculated. The buffer
+ * is padded to PG_CONTROL_MAX_SAFE_SIZE so that pg_control is always
+ * a consistent size but smaller (and hopefully easier to handle) than
+ * PG_CONTROL_FILE_SIZE. Bytes after sizeof(ControlFileData) are zeroed.
+ */
+ uint8_t controlFile[PG_CONTROL_MAX_SAFE_SIZE];
} BackupState;
-extern char *build_backup_content(BackupState *state,
- bool
ishistoryfile);
+extern char *build_backup_content(BackupState *state);
#endif /* XLOG_BACKUP_H */
diff --git a/src/include/access/xlogrecovery.h
b/src/include/access/xlogrecovery.h
index ee0bc742782..981266f7340 100644
--- a/src/include/access/xlogrecovery.h
+++ b/src/include/access/xlogrecovery.h
@@ -80,8 +80,7 @@ extern Size XLogRecoveryShmemSize(void);
extern void XLogRecoveryShmemInit(void);
extern void InitWalRecovery(ControlFileData *ControlFile,
- bool *wasShutdown_ptr,
bool *haveBackupLabel_ptr,
- bool
*haveTblspcMap_ptr);
+ bool *wasShutdown_ptr,
bool *haveTblspcMap_ptr);
extern void PerformWalRecovery(void);
/*
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 2ae72e3b266..64bab07e056 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -146,6 +146,9 @@ typedef struct ControlFileData
* to disk, we mustn't start up until we reach X again. Zero when not
* doing archive recovery.
*
+ * backupCheckPoint is the backup start checkpoint and is set to zero
after
+ * recovery is initialized.
+ *
* backupStartPoint is the redo pointer of the backup start checkpoint,
if
* we are recovering from an online backup and haven't reached the end
of
* backup yet. It is reset to zero when the end of backup is reached,
and
@@ -160,14 +163,25 @@ typedef struct ControlFileData
* pg_control which was backed up last. It is reset to zero when the end
* of backup is reached, and we mustn't start up before that.
*
+ * backupRecoveryRequired indicates that the pg_control file was
provided
+ * by a backup or pg_rewind and recovery settings need to be copied. It
will
+ * be set to false when the settings have been copied.
+ *
+ * backupFromStandby indicates that the backup was taken on a standby.
It is
+ * require to initialize recovery and set to false afterwards.
+ *
* If backupEndRequired is true, we know for sure that we're restoring
* from a backup, and must see a backup-end record before we can safely
* start up.
*/
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
+ XLogRecPtr backupCheckPoint;
XLogRecPtr backupStartPoint;
+ TimeLineID backupStartPointTLI;
XLogRecPtr backupEndPoint;
+ bool backupRecoveryRequired;
+ bool backupFromStandby;
bool backupEndRequired;
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 06435e8b925..9dfdf79afa4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6418,13 +6418,17 @@
prosrc => 'pg_terminate_backend' },
{ oid => '2172', descr => 'prepare for taking an online backup',
proname => 'pg_backup_start', provolatile => 'v', proparallel => 'r',
- prorettype => 'pg_lsn', proargtypes => 'text bool',
+ prorettype => 'record', proargtypes => 'text bool',
+ proallargtypes => '{text,bool,pg_lsn,int8,timestamptz}',
+ proargmodes => '{i,i,o,o,o}',
+ proargnames => '{label,fast,lsn,timeline_id,start}',
prosrc => 'pg_backup_start' },
{ oid => '2739', descr => 'finish taking an online backup',
proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r',
prorettype => 'record', proargtypes => 'bool',
- proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}',
- proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}',
+ proallargtypes => '{bool,bytea,text,pg_lsn,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames =>
'{wait_for_archive,pg_control_file,tablespace_map_file,lsn,timeline_id,stop}',
prosrc => 'pg_backup_stop' },
{ oid => '3436', descr => 'promote standby server',
proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',