On Thursday, June 07, 2012 03:58:24 PM Andres Freund wrote:
> Hi,
> 
> On Thursday, June 07, 2012 12:44:08 PM Valentine Gogichashvili wrote:
> > I have the situation again, one of 3 slaves was slow to play all the WAL
> > files and being about 10GB late it crashed with the same error again.
> > 
> > I collected DEBUG4 output in this time:
> > https://docs.google.com/open?id=0B2NMMrfiBQcLZjNDbU0xQ3lvWms
> 
> Ok, I stared at this some time and I think I see what the problem is. Some
> log excerpts that lead my reasoning:
> ...
> after that we start adding all currently running xids from the snapshot to
> the KnownAssigned machinery. They are already recorded though, so we fail
> in KnownAssignedXidsAdd with the OPs error.
> 
> The simplest fix for that seems to be to simply reset the KnownAssignedXids
> state in the above branch. Any arguments against that?
A patch implementing that is attached. Unfortunately not really tested yet 
because its kinda hard to hit that code-path.

Valentine, can you test that patch?

Andres
-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 91c6b4195233c5dfeb794b983c97cef61d966e1b Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 7 Jun 2012 18:38:32 +0200
Subject: [PATCH] Fix a bug in the assembly of recovery snapshots in Hot
 Standby

We previously failed if we read a non-overflowed snapshot after starting to
incrementally assemble a snapshot after reading an overflowed one. Code added
in 10b7c686e52a6d1bb10194ebf9331ef06f044d46 added a fallback in that case to
simply build a completely new snapshot if we have the necessary information but
forgot to cleanup the partial incremental one.

Add and use a KnownAssignedXidsReset function for that.
---
 src/backend/storage/ipc/procarray.c |   28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 26469c4..25bfefc 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -160,6 +160,7 @@ static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
 							   TransactionId xmax);
 static TransactionId KnownAssignedXidsGetOldestXmin(void);
 static void KnownAssignedXidsDisplay(int trace_level);
+static void KnownAssignedXidsReset(void);
 
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -526,6 +527,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		 */
 		if (!running->subxid_overflow || running->xcnt == 0)
 		{
+			/*
+			 * we already may have collected assigned xids, we need to throw
+			 * that knowledge away to apply the recovery snapshot.
+			 */
+			KnownAssignedXidsReset();
 			standbyState = STANDBY_INITIALIZED;
 		}
 		else
@@ -569,7 +575,8 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 * xids to subtrans. If RunningXacts is overflowed then we don't have
 	 * enough information to correctly update subtrans anyway.
 	 */
-	Assert(procArray->numKnownAssignedXids == 0);
+	if(procArray->numKnownAssignedXids != 0)
+		elog(ERROR, "the KnownAssignedXids machinery cannot be initialized when applying a full recovery snapshot");
 
 	/*
 	 * Allocate a temporary array to avoid modifying the array passed as
@@ -3340,3 +3347,22 @@ KnownAssignedXidsDisplay(int trace_level)
 
 	pfree(buf.data);
 }
+
+/*
+ * KnownAssignedXidsReset
+ *		Resets KnownAssignedXids to be empty
+ */
+static void
+KnownAssignedXidsReset(void)
+{
+	/* use volatile pointer to prevent code rearrangement */
+	volatile ProcArrayStruct *pArray = procArray;
+
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+	pArray->numKnownAssignedXids = 0;
+	pArray->tailKnownAssignedXids = 0;
+	pArray->headKnownAssignedXids = 0;
+
+	LWLockRelease(ProcArrayLock);
+}
-- 
1.7.10.rc3.3.g19a6c.dirty

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to