Hi all,

On 2012-11-27 19:52:09 +0000, tar...@gmail.com wrote:
> This happens only if wal_level=hot_standby.
>
>
> Here are the steps to reproduce this issue.

Oh. Fucking. Wow.

I think this tiny comment just helped me find the bug. After previously
having looked for it without success for some time I just recognized the
problem while working on something unrelated.

Here it goes:

void
CreateCheckPoint(int flags)
{
...
        /*
         * Get the other info we need for the checkpoint record.
         */
        LWLockAcquire(XidGenLock, LW_SHARED);
        checkPoint.nextXid = ShmemVariableCache->nextXid;
        checkPoint.oldestXid = ShmemVariableCache->oldestXid;
        checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
        LWLockRelease(XidGenLock);

        /* Increase XID epoch if we've wrapped around since last checkpoint */
        checkPoint.nextXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
        if (checkPoint.nextXid < ControlFile->checkPointCopy.nextXid)
                checkPoint.nextXidEpoch++;
// i.e. compute the epoch based on the *current* nextXid.

...
// do all the writing, take some time
        CheckPointGuts(checkPoint.redo, flags);
...
         * Update checkPoint.nextXid since we have a later value
         */
        if (!shutdown && XLogStandbyInfoActive())
                LogStandbySnapshot(&checkPoint.nextXid);
...
        /* Update shared-memory copy of checkpoint XID/epoch */
        {
                /* use volatile pointer to prevent code rearrangement */
                volatile XLogCtlData *xlogctl = XLogCtl;

                SpinLockAcquire(&xlogctl->info_lck);
                xlogctl->ckptXidEpoch = checkPoint.nextXidEpoch;
                xlogctl->ckptXid = checkPoint.nextXid;
                SpinLockRelease(&xlogctl->info_lck);
        }
...
}

Notice the end of the comment above/about LogStandbySnapshot. It updates
nextXid! But it does *not* recheck whether we have had a wraparound +
so it does not increase the epoch counter. Although we might have had a
wraparound inbetween.

Which in turn means the the ShmemVariableCache->nextXid <
xlogctl->ckptXid computation in GetNextXidAndEpoch doesn't return
correct results anymore because it can't recognize that were in a new
epoch now.

Trivial patch attached.

I am not sure that I understand why its interesting to have a newer
->nextXid in the checkpoint, but I don't see anything that would
make it dangerous besides this.

This looks like it also explains #6291 and the slighly different issue
described in CAAZKuFbB7UR3NXV1pkZFRXy=6V1QBq_OeHJWJNTLpKBpH=q...@mail.gmail.com,
as the issue is just as present on standbys as it is on primaries.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From bf1ff10acf7dbfd9a6046c5fcb4bb816485a1e7b Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 1 Dec 2012 23:51:34 +0100
Subject: [PATCH] Fix xid epoch calculation with wal_level=hot_standby

When wal_level=hot_standby CreateCheckPoint also logs running
transactions. During the computation of those it recomputes
CheckPoint->nextXid. Unfortunately it does so after it has been used to compute
CheckPoint->nextXidEpoch.

Move code around slightly to fix the issue.

This fixes #6291, #7710 and the slighly different issue described in
CAAZKuFbB7UR3NXV1pkZFRXy=6V1QBq_OeHJWJNTLpKBpH=q...@mail.gmail.com, as the
issue is just as present on standbys as it is on primaries.

Bug found by Daniel Farina and Tarvi Pillessaar
---
 src/backend/access/transam/xlog.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8b19976..4760081 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7072,11 +7072,6 @@ CreateCheckPoint(int flags)
 	checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB;
 	LWLockRelease(XidGenLock);
 
-	/* Increase XID epoch if we've wrapped around since last checkpoint */
-	checkPoint.nextXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
-	if (checkPoint.nextXid < ControlFile->checkPointCopy.nextXid)
-		checkPoint.nextXidEpoch++;
-
 	LWLockAcquire(OidGenLock, LW_SHARED);
 	checkPoint.nextOid = ShmemVariableCache->nextOid;
 	if (!shutdown)
@@ -7115,6 +7110,14 @@ CreateCheckPoint(int flags)
 	START_CRIT_SECTION();
 
 	/*
+	 * Increase XID epoch if we've wrapped around since last checkpoint, do
+	 * this after LogStandbySnapshot which updates nextXid.
+	 */
+	checkPoint.nextXidEpoch = ControlFile->checkPointCopy.nextXidEpoch;
+	if (checkPoint.nextXid < ControlFile->checkPointCopy.nextXid)
+		checkPoint.nextXidEpoch++;
+
+	/*
 	 * Now insert the checkpoint record into XLOG.
 	 */
 	rdata.data = (char *) (&checkPoint);
-- 
1.7.12.289.g0ce9864.dirty

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to