On 08.12.2011 08:20, Tom Lane wrote:
I thought that removing the unreadable file would let me restart,
but after "rm 52860_fsm" and trying again to start the server,
there's a different problem:

LOG:  database system was interrupted while in recovery at 2011-12-08 00:56:11 
EST
HINT:  This probably means that some data is corrupted and you will have to use 
the last backup for recovery.
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/5D71BA8
LOG:  redo starts at 0/5100050
WARNING:  page 18 of relation base/27666/52860 is uninitialized
CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
PANIC:  WAL contains references to invalid pages
CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
LOG:  startup process (PID 14507) was terminated by signal 6
LOG:  aborting startup due to startup process failure

Note that this isn't even the same symptom Shraibman hit, since
apparently he was failing on replaying the commit record.  How is it
that the main table file managed to have uninitialized pages?
I suspect that "redo visible" WAL processing is breaking one or more of
the fundamental WAL rules, perhaps by not generating a full-page image
when it needs to.

So this is really a whole lot worse than our behavior was in pre-FSM
days, and it needs to get fixed.

This bug was actually introduced only recently. Notice how the log says "consistent recovery state reached at 0/5D71BA8". This interacts badly with Fujii's patch I committed last week:

commit 1e616f639156b2431725f7823c999486ca46c1ea
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date:   Fri Dec 2 10:49:54 2011 +0200

    During recovery, if we reach consistent state and still have entries in the
    invalid-page hash table, PANIC immediately. Immediate PANIC is much better
    than waiting for end-of-recovery, which is what we did before, because the
    end-of-recovery might not come until months later if this is a standby
    server.
...

Recovery thinks it has reached consistency early on, so it PANICs as soon as it sees a reference to a page that belongs to a table that was later dropped.

The bug here is that we consider having immediately reached consistency during crash recovery. That's a false notion: during crash recovery the database isn't consistent until *all* WAL has been replayed. We should not set reachedMinRecoveryPoint during crash recovery at all. And perhaps the flag should be renamed to reachedConsistency, to make it clear that during crash recovery it's not enough to reach the nominal minRecoveryPoint.

That was harmless until last week, because reachedMinRecoveryPoint was not used for anything unless you're doing archive recovery and hot standby was enabled, but IMO the "consistent recovery state reached" log message was misleading even then. I propose that we apply the attached patch to master and backbranches.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9bec660..9d96044 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -562,7 +562,13 @@ static TimeLineID lastPageTLI = 0;
 static XLogRecPtr minRecoveryPoint;		/* local copy of
 										 * ControlFile->minRecoveryPoint */
 static bool updateMinRecoveryPoint = true;
-bool reachedMinRecoveryPoint = false;
+
+/*
+ * Have we reached a consistent database state? In crash recovery, we have
+ * to replay all the WAL, so reachedConsistency is never set. During archive
+ * recovery, the database is consistent once minRecoveryPoint is reached.
+ */
+bool reachedConsistency = false;
 
 static bool InRedo = false;
 
@@ -6894,9 +6900,16 @@ static void
 CheckRecoveryConsistency(void)
 {
 	/*
+	 * During crash recovery, we don't reach a consistent state until we've
+	 * replayed all the WAL.
+	 */
+	if (XLogRecPtrIsInvalid(minRecoveryPoint))
+		return;
+
+	/*
 	 * Have we passed our safe starting point?
 	 */
-	if (!reachedMinRecoveryPoint &&
+	if (!reachedConsistency &&
 		XLByteLE(minRecoveryPoint, EndRecPtr) &&
 		XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
 	{
@@ -6906,7 +6919,7 @@ CheckRecoveryConsistency(void)
 		 */
 		XLogCheckInvalidPages();
 
-		reachedMinRecoveryPoint = true;
+		reachedConsistency = true;
 		ereport(LOG,
 				(errmsg("consistent recovery state reached at %X/%X",
 						EndRecPtr.xlogid, EndRecPtr.xrecoff)));
@@ -6919,7 +6932,7 @@ CheckRecoveryConsistency(void)
 	 */
 	if (standbyState == STANDBY_SNAPSHOT_READY &&
 		!LocalHotStandbyActive &&
-		reachedMinRecoveryPoint &&
+		reachedConsistency &&
 		IsUnderPostmaster)
 	{
 		/* use volatile pointer to prevent code rearrangement */
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 350d434..b280434 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -85,7 +85,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
 	 * linger in the hash table until the end of recovery and PANIC there,
 	 * which might come only much later if this is a standby server.
 	 */
-	if (reachedMinRecoveryPoint)
+	if (reachedConsistency)
 	{
 		report_invalid_page(WARNING, node, forkno, blkno, present);
 		elog(PANIC, "WAL contains references to invalid pages");
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1fb56cd..1b31414 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -190,7 +190,7 @@ typedef enum
 
 extern XLogRecPtr XactLastRecEnd;
 
-extern bool reachedMinRecoveryPoint;
+extern bool reachedConsistency;
 
 /* these variables are GUC parameters related to XLOG */
 extern int	CheckPointSegments;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to