The locking in smgr_redo_commit and smgr_redo_abort doesn't look right. First of all, smgr_redo_abort is not holding XidGenLock and ProcArrayLock while modifying ShmemVariableCache->nextXid and ShmemVariableCache->latestCompletedXid, respectively, like smgr_redo_commit is. Attached patch fixes that.
But I think there's a little race condition in there anyway, as they remove the finished xids from known-assigned-xids list by calling ExpireTreeKnownAssignedTransactionIds, and ShmemVariableCache->latestCompletedXid is updated only after that. We're not holding ProcArrayLock between those two steps, like we do during normal transaction commit. I'm not sure what kind of issues that can lead to, but it seems like it can lead to broken snapshots if someone calls GetSnapshotData() between those steps. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
>From 9d6ef0a3eb901b1d2182b3f2efd5c75d52e88cba Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <hei...@enterprisedb.com> Date: Sun, 27 Sep 2009 14:51:43 +0300 Subject: [PATCH] Make locking in smgr_redo_abort reflect that in smgr_redo_commit. --- src/backend/access/transam/xact.c | 22 ++++++++++++++++------ 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a696857..92a7c43 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -4502,10 +4502,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid, /* * Release locks, if any. We do this for both two phase and normal * one phase transactions. In effect we are ignoring the prepare - * phase and just going straight to lock release. This explains - * why the twophase_postcommit_standby_callbacks[] do not invoke - * a special routine to handle locks - that is performed here - * instead. + * phase and just going straight to lock release. */ RelationReleaseRecoveryLockTree(xid, xlrec->nsubxacts, sub_xids); } @@ -4593,12 +4590,25 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid) } /* Make sure nextXid is beyond any XID mentioned in the record */ + /* We don't expect anyone else to modify nextXid, hence we + * don't need to hold a lock while checking this. We still acquire + * the lock to modify it, though. + */ if (TransactionIdFollowsOrEquals(max_xid, ShmemVariableCache->nextXid)) { + LWLockAcquire(XidGenLock, LW_EXCLUSIVE); ShmemVariableCache->nextXid = max_xid; - ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid; - TransactionIdAdvance(ShmemVariableCache->nextXid); + LWLockRelease(XidGenLock); + } + + /* Same here, don't use lock to test, but need one to modify */ + if (TransactionIdFollowsOrEquals(max_xid, + ShmemVariableCache->latestCompletedXid)) + { + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + ShmemVariableCache->latestCompletedXid = max_xid; + LWLockRelease(ProcArrayLock); } /* Make sure files supposed to be dropped are dropped */ -- 1.6.3.3
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers