On Thursday, June 07, 2012 03:58:24 PM Andres Freund wrote: > Hi, > > On Thursday, June 07, 2012 12:44:08 PM Valentine Gogichashvili wrote: > > I have the situation again, one of 3 slaves was slow to play all the WAL > > files and being about 10GB late it crashed with the same error again. > > > > I collected DEBUG4 output in this time: > > https://docs.google.com/open?id=0B2NMMrfiBQcLZjNDbU0xQ3lvWms > > Ok, I stared at this some time and I think I see what the problem is. Some > log excerpts that lead my reasoning: > ... > after that we start adding all currently running xids from the snapshot to > the KnownAssigned machinery. They are already recorded though, so we fail > in KnownAssignedXidsAdd with the OPs error. > > The simplest fix for that seems to be to simply reset the KnownAssignedXids > state in the above branch. Any arguments against that? A patch implementing that is attached. Unfortunately not really tested yet because its kinda hard to hit that code-path.
Valentine, can you test that patch? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
From 91c6b4195233c5dfeb794b983c97cef61d966e1b Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Thu, 7 Jun 2012 18:38:32 +0200 Subject: [PATCH] Fix a bug in the assembly of recovery snapshots in Hot Standby We previously failed if we read a non-overflowed snapshot after starting to incrementally assemble a snapshot after reading an overflowed one. Code added in 10b7c686e52a6d1bb10194ebf9331ef06f044d46 added a fallback in that case to simply build a completely new snapshot if we have the necessary information but forgot to cleanup the partial incremental one. Add and use a KnownAssignedXidsReset function for that. --- src/backend/storage/ipc/procarray.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 26469c4..25bfefc 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -160,6 +160,7 @@ static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId xmax); static TransactionId KnownAssignedXidsGetOldestXmin(void); static void KnownAssignedXidsDisplay(int trace_level); +static void KnownAssignedXidsReset(void); /* * Report shared-memory space needed by CreateSharedProcArray. @@ -526,6 +527,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running) */ if (!running->subxid_overflow || running->xcnt == 0) { + /* + * we already may have collected assigned xids, we need to throw + * that knowledge away to apply the recovery snapshot. + */ + KnownAssignedXidsReset(); standbyState = STANDBY_INITIALIZED; } else @@ -569,7 +575,8 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running) * xids to subtrans. If RunningXacts is overflowed then we don't have * enough information to correctly update subtrans anyway. */ - Assert(procArray->numKnownAssignedXids == 0); + if(procArray->numKnownAssignedXids != 0) + elog(ERROR, "the KnownAssignedXids machinery cannot be initialized when applying a full recovery snapshot"); /* * Allocate a temporary array to avoid modifying the array passed as @@ -3340,3 +3347,22 @@ KnownAssignedXidsDisplay(int trace_level) pfree(buf.data); } + +/* + * KnownAssignedXidsReset + * Resets KnownAssignedXids to be empty + */ +static void +KnownAssignedXidsReset(void) +{ + /* use volatile pointer to prevent code rearrangement */ + volatile ProcArrayStruct *pArray = procArray; + + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + + pArray->numKnownAssignedXids = 0; + pArray->tailKnownAssignedXids = 0; + pArray->headKnownAssignedXids = 0; + + LWLockRelease(ProcArrayLock); +} -- 1.7.10.rc3.3.g19a6c.dirty
-- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs