On Mon, 2009-08-17 at 11:19 +0300, Heikki Linnakangas wrote: > I think there's a race condition in the way LogCurrentRunningXacts() is > called at the end of checkpoint. This can happen in the master: > > 1. Checkpoint starts > 2. Transaction 123 begins, and does some updates > 3. Checkpoint ends. LogCurrentRunningXacts() is called. > 4. LogCurrentRunningXacts() gets the list of currently running > transactions by calling GetCurrentTransactionData(). > 5. Transaction 123 ends, writing commit record to WAL > 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This > includes XID 123, since that was still running at step 4. > > When that is replayed, ProcArrayUpdateTransactions() will zap the > unobserved xids array with the list that includes XID 123, even though > we already saw a commit record for it.
That's not a race condition, but it does make the code more complex. The issue has been long understood. I don't think it's acceptable to take and hold both ProcArray and WALInsertLock. Those are now the two most heavily contended locks on the system. We have evidence that there are burst delays associated with various operations on just one of those locks, let alone two. If you're still doubtful, the problem I've been working on recently is the point that I overlooked the initial state of the lock table in my earlier patch. GetRunningTransactionData() also needs to have initial lock data. There is no way in hell that I could personally condone holding ProcArrayLock, WALInsertLock and all of the LockMgrLock partitions at same time. So we just have to eat the complexity. (No doubt someone will disagree with my strong language here, but please take it as an indication of exactly how bad an idea holding multiple locks will be). Slight timing issues are not too bad really. We just have to be careful to assume that there is a mismatch in the data and must have code to handle that. Anyway, I've been working on this problem for some time and continue to do so. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers