On 14.02.2013 17:45, Jehan-Guillaume de Rorthais wrote:
I am facing an unexpected behavior on a 9.2.2 cluster that I can
reproduce on current HEAD.

On a cluster with archive enabled but failing, after a crash of
postmaster, the checkpoint occurring before leaving the recovery mode
deletes any additional WALs, even those waiting to be archived.
> ...
> Is it expected ?

No, it's a bug. Ouch. It was introduced in 9.2, by commit 5286105800c7d5902f98f32e11b209c471c0c69c:

-  /*
-   * Normally we don't delete old XLOG files during recovery to
-   * avoid accidentally deleting a file that looks stale due to a
-   * bug or hardware issue, but in fact contains important data.
-   * During streaming recovery, however, we will eventually fill the
-   * disk if we never clean up, so we have to. That's not an issue
-   * with file-based archive recovery because in that case we
-   * restore one XLOG file at a time, on-demand, and with a
-   * different filename that can't be confused with regular XLOG
-   * files.
-   */
-   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde->d_name))
+   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
         [ delete the file ]

With that commit, we started to keep WAL segments restored from the archive in pg_xlog, so we needed to start deleting old segments during archive recovery, even when streaming replication was not active. But the above change was to broad; we started to delete old segments also during crash recovery.

The above should check InArchiveRecovery, ie. only delete old files when in archive recovery, not when in crash recovery. But there's one little complication: InArchiveRecovery is currently only valid in the startup process, so we'll need to also share it in shared memory, so that the checkpointer process can access it.

I propose the attached patch to fix it.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30d877b..dabb094 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -418,6 +418,7 @@ typedef struct XLogCtlData
 	 * recovery.  Protected by info_lck.
 	 */
 	bool		SharedRecoveryInProgress;
+	bool		SharedInArchiveRecovery;
 
 	/*
 	 * SharedHotStandbyActive indicates if we're still in crash or archive
@@ -622,6 +623,7 @@ static void XLogArchiveCleanup(const char *xlog);
 static void readRecoveryCommandFile(void);
 static void exitArchiveRecovery(TimeLineID endTLI,
 					uint32 endLogId, uint32 endLogSeg);
+static bool ArchiveRecoveryInProgress(void);
 static bool recoveryStopsHere(XLogRecord *record, bool *includeThis);
 static void recoveryPausesHere(void);
 static void SetLatestXTime(TimestampTz xtime);
@@ -3571,7 +3573,7 @@ RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
 			strspn(xlde->d_name, "0123456789ABCDEF") == 24 &&
 			strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
 		{
-			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
+			if (ArchiveRecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
 			{
 				snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
 
@@ -5289,6 +5291,7 @@ XLOGShmemInit(void)
 	 */
 	XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl->SharedRecoveryInProgress = true;
+	XLogCtl->SharedInArchiveRecovery = false;
 	XLogCtl->SharedHotStandbyActive = false;
 	XLogCtl->WalWriterSleeping = false;
 	XLogCtl->Insert.currpage = (XLogPageHeader) (XLogCtl->pages);
@@ -5680,6 +5683,7 @@ readRecoveryCommandFile(void)
 
 	/* Enable fetching from archive recovery area */
 	InArchiveRecovery = true;
+	XLogCtl->SharedInArchiveRecovery = true;
 
 	/*
 	 * If user specified recovery_target_timeline, validate it or compute the
@@ -5718,11 +5722,16 @@ exitArchiveRecovery(TimeLineID endTLI, uint32 endLogId, uint32 endLogSeg)
 {
 	char		recoveryPath[MAXPGPATH];
 	char		xlogpath[MAXPGPATH];
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
 
 	/*
 	 * We are no longer in archive recovery state.
 	 */
 	InArchiveRecovery = false;
+	SpinLockAcquire(&xlogctl->info_lck);
+	xlogctl->SharedInArchiveRecovery = false;
+	SpinLockRelease(&xlogctl->info_lck);
 
 	/*
 	 * Update min recovery point one last time.
@@ -7315,6 +7324,25 @@ RecoveryInProgress(void)
 }
 
 /*
+ * Are we currently in archive recovery? In the startup process, you can just
+ * check InArchiveRecovery variable instead.
+ */
+static bool
+ArchiveRecoveryInProgress()
+{
+	bool		result;
+	/* use volatile pointer to prevent code rearrangement */
+	volatile XLogCtlData *xlogctl = XLogCtl;
+
+	/* spinlock is essential on machines with weak memory ordering! */
+	SpinLockAcquire(&xlogctl->info_lck);
+	result = xlogctl->SharedInArchiveRecovery;
+	SpinLockRelease(&xlogctl->info_lck);
+
+	return result;
+}
+
+/*
  * Is HotStandby active yet? This is only important in special backends
  * since normal backends won't ever be able to connect until this returns
  * true. Postmaster knows this by way of signal, not via shared memory.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to