On Thu, Aug 20, 2009 at 1:55 AM, Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> wrote: >>>> When that is replayed, ProcArrayUpdateTransactions() will zap the >>>> unobserved xids array with the list that includes XID 123, even though >>>> we already saw a commit record for it. >> >> I looked at this a little more. I'm wondering if we can fix this by >> making ProcArrayUpdateRecoveryTransactions() smarter. Can we just >> refuse to add an Xid to the UnobservedXids() array if in fact we've >> already observed it? (Not sure how to check that.) > > There's also the opposite problem: If a transaction starts (and writes a > WAL record) between LogCurrentRunningXacts() and XLogInstrt(), it is not > present in the RunningXacts record. When the standby replays the > RunningXacts record, it removes the XID of that transaction from the > array, even though it's still running.
Yep, Simon appears to have contemplated that problem - see comments in ProcArrayUpdateRecoveryTransactions(). >> Fixing this on the >> master would seem to require acquiring the WALInsertLock before >> calling GetRunningTransactionData() and holding it until we finish >> writing that data to WAL, which I suspect someone's going to complain >> about... > > Yeah, it's hard to get that locking right without risking deadlock. As > the patch stands, we only write a RunningXacts record once per > checkpoint, so it's not performance critical, but we must avoid deadlocks. > > If there's a way, I would prefer a solution where the RunningXacts > snapshot represents the situation the moment it appears in WAL, not some > moment before it. It makes the logic easier to understand. I think this is going to be difficult. At a preliminary look, it seems to require taking a sledgehammer to the abstraction layer encapsulated by XLogInsert(). It's also going to require holding both ProcArrayLock and WALInsertLock simultaneously. I'm not sure where the risk of deadlock comes in - we just have to define a rule (or maintain the existing rule) about which order to acquire those two locks in. But I'm guessing the existing rule is along the lines of "Don't do that, or Tom Lane will reject your patch and I'll you'll get is this stupid T-shirt." http://img199.yfrog.com/i/b9w.jpg/ ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers