On Sun, 2009-09-27 at 14:59 +0300, Heikki Linnakangas wrote: > The locking in smgr_redo_commit and smgr_redo_abort doesn't look right. > First of all, smgr_redo_abort is not holding XidGenLock and > ProcArrayLock while modifying ShmemVariableCache->nextXid and > ShmemVariableCache->latestCompletedXid, respectively, like > smgr_redo_commit is. Attached patch fixes that. > > But I think there's a little race condition in there anyway, as they > remove the finished xids from known-assigned-xids list by calling > ExpireTreeKnownAssignedTransactionIds, and > ShmemVariableCache->latestCompletedXid is updated only after that. We're > not holding ProcArrayLock between those two steps, like we do during > normal transaction commit. I'm not sure what kind of issues that can > lead to, but it seems like it can lead to broken snapshots if someone > calls GetSnapshotData() between those steps.
I confess I didn't know what you were talking about when you wrote this and was expecting you to have a coffee and retract. I realise now you meant xact_redo_commit() rather than smgr_ and it makes sense at last. I've just committed about half of your patch exactly, but not all. I've avoided making the changes to ShmemVariableCache->latestCompletedXid directly and moved the update to ExpireTreeKnownAssignedTransactionIds() which already had access to max_xid while holding ProcArrayLock. Hopefully that resolves your comment about race condition. Also, I noticed that you removed the line TransactionIdAdvance(ShmemVariableCache->nextXid); in xact_redo_abort(). That looks like an error to me, since this follows neither the comment nor how it is coded in xact_redo_commit(). Let me know if there was some other reason for that line removal and I'll make the change again. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers