On 08.12.2011 08:20, Tom Lane wrote:
I thought that removing the unreadable file would let me restart,
but after "rm 52860_fsm" and trying again to start the server,
there's a different problem:
LOG: database system was interrupted while in recovery at 2011-12-08 00:56:11
EST
HINT: This probably means that some data is corrupted and you will have to use
the last backup for recovery.
LOG: database system was not properly shut down; automatic recovery in progress
LOG: consistent recovery state reached at 0/5D71BA8
LOG: redo starts at 0/5100050
WARNING: page 18 of relation base/27666/52860 is uninitialized
CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18
PANIC: WAL contains references to invalid pages
CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18
LOG: startup process (PID 14507) was terminated by signal 6
LOG: aborting startup due to startup process failure
Note that this isn't even the same symptom Shraibman hit, since
apparently he was failing on replaying the commit record. How is it
that the main table file managed to have uninitialized pages?
I suspect that "redo visible" WAL processing is breaking one or more of
the fundamental WAL rules, perhaps by not generating a full-page image
when it needs to.
So this is really a whole lot worse than our behavior was in pre-FSM
days, and it needs to get fixed.
This bug was actually introduced only recently. Notice how the log says
"consistent recovery state reached at 0/5D71BA8". This interacts badly
with Fujii's patch I committed last week:
commit 1e616f639156b2431725f7823c999486ca46c1ea
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Fri Dec 2 10:49:54 2011 +0200
During recovery, if we reach consistent state and still have entries in the
invalid-page hash table, PANIC immediately. Immediate PANIC is much better
than waiting for end-of-recovery, which is what we did before, because the
end-of-recovery might not come until months later if this is a standby
server.
...
Recovery thinks it has reached consistency early on, so it PANICs as
soon as it sees a reference to a page that belongs to a table that was
later dropped.
The bug here is that we consider having immediately reached consistency
during crash recovery. That's a false notion: during crash recovery the
database isn't consistent until *all* WAL has been replayed. We should
not set reachedMinRecoveryPoint during crash recovery at all. And
perhaps the flag should be renamed to reachedConsistency, to make it
clear that during crash recovery it's not enough to reach the nominal
minRecoveryPoint.
That was harmless until last week, because reachedMinRecoveryPoint was
not used for anything unless you're doing archive recovery and hot
standby was enabled, but IMO the "consistent recovery state reached" log
message was misleading even then. I propose that we apply the attached
patch to master and backbranches.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9bec660..9d96044 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -562,7 +562,13 @@ static TimeLineID lastPageTLI = 0;
static XLogRecPtr minRecoveryPoint; /* local copy of
* ControlFile->minRecoveryPoint */
static bool updateMinRecoveryPoint = true;
-bool reachedMinRecoveryPoint = false;
+
+/*
+ * Have we reached a consistent database state? In crash recovery, we have
+ * to replay all the WAL, so reachedConsistency is never set. During archive
+ * recovery, the database is consistent once minRecoveryPoint is reached.
+ */
+bool reachedConsistency = false;
static bool InRedo = false;
@@ -6894,9 +6900,16 @@ static void
CheckRecoveryConsistency(void)
{
/*
+ * During crash recovery, we don't reach a consistent state until we've
+ * replayed all the WAL.
+ */
+ if (XLogRecPtrIsInvalid(minRecoveryPoint))
+ return;
+
+ /*
* Have we passed our safe starting point?
*/
- if (!reachedMinRecoveryPoint &&
+ if (!reachedConsistency &&
XLByteLE(minRecoveryPoint, EndRecPtr) &&
XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
{
@@ -6906,7 +6919,7 @@ CheckRecoveryConsistency(void)
*/
XLogCheckInvalidPages();
- reachedMinRecoveryPoint = true;
+ reachedConsistency = true;
ereport(LOG,
(errmsg("consistent recovery state reached at %X/%X",
EndRecPtr.xlogid, EndRecPtr.xrecoff)));
@@ -6919,7 +6932,7 @@ CheckRecoveryConsistency(void)
*/
if (standbyState == STANDBY_SNAPSHOT_READY &&
!LocalHotStandbyActive &&
- reachedMinRecoveryPoint &&
+ reachedConsistency &&
IsUnderPostmaster)
{
/* use volatile pointer to prevent code rearrangement */
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 350d434..b280434 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -85,7 +85,7 @@ log_invalid_page(RelFileNode node, ForkNumber forkno, BlockNumber blkno,
* linger in the hash table until the end of recovery and PANIC there,
* which might come only much later if this is a standby server.
*/
- if (reachedMinRecoveryPoint)
+ if (reachedConsistency)
{
report_invalid_page(WARNING, node, forkno, blkno, present);
elog(PANIC, "WAL contains references to invalid pages");
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1fb56cd..1b31414 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -190,7 +190,7 @@ typedef enum
extern XLogRecPtr XactLastRecEnd;
-extern bool reachedMinRecoveryPoint;
+extern bool reachedConsistency;
/* these variables are GUC parameters related to XLOG */
extern int CheckPointSegments;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers