On Dec20, 2010, at 18:54 , Robert Haas wrote: > On Mon, Dec 20, 2010 at 12:49 PM, Florian Pflug <f...@phlo.org> wrote: >> For me, this is another very good reason to explore this further. Plus, it >> improves the ratio of grotty-ness vs. number-of-problems-soved ;-) > > By all means, look into it further. I fear the boat is filling up > with water, but if you manage to come up with a workable solution I'll > be as happy as anyone, promise!
I'll try to create a details proposal. To do that, however, I'll require some guidance on whats acceptable and whats not. Here's a summary of the preceding discussion To deal with aborted transactions correctly, we need to track the last locker of a particular tuple that actually committed. If we also want to fix the bug that causes a row lock to be lost upon doing lock;savepoint;update;restore that "latest committed locker" will sometimes need to be a set, since it'll need to store the outer transaction's xid as well as the latest actually committed locker. As long as no transaction aborts are involved, the tuple's xmax contains all the information we need. If a transaction updates, deletes or locks a row, the previous xmax is overwritten. If the transaction later aborts, we cannot decide whether it has previously been locked or not. And these ideas have come up A) Transactions who merely lock a row could put the previous locker's xid (if >= GlobalXmin) *and* their own xid into a multi-xid, and store that in xmax. For shared locks, this merely means cleaning out the existing multi-xid a bit less aggressively. There's no risk of bloat there, since we only need to keep one committed xid, not all of them. For exclusive locks, we currently never create a multi-xid. That'd change, we'd need to create one if we find a previous locker with an xid >= GlobalXmin. This doesn't solve the UPDATE and DELETE cases. For SELECT-FOR-SHARE this is probably the best option, since it comes very close to what we do currently. B) A transaction who UPDATEs or DELETEs a tuple could create an intermediate lock-only tuple which'd contain the necessary information about previous lock holders. We'd only need to do that if there actually is one with xid >= GlobalXmin. We could then choose whether to do the same for SELECT-FOR-UPDATE, or whether we'd prefer to go with (A) C) The ctid field is only necessary for updated tuples. We could thus overlay it with a field which stores the last committed locker after a DELETE. UPDATEs could be handled either as in (B), or by storing the information in the ctid-overlay in the *new* tuple. SELECT-FOR-UPDATE could again either also use the ctid overlay or use (A). D) We could add a new tuple header field xlatest. To support binary upgrade, we'd need to be able to read tuples without that field also. We could then either create a new tuple version upon the first lock request to such a tuple (which would then include the new header), or we could simply raise a serialization error if a serializable transaction tried to update a tuple without the field whose xmax was aborted and >= GlobalXmin. I have the nagging feeling that (D) will meet quite some resistance. (C) was too well received either, though I wonder if that'd change if the grotty-ness was hidden behind a API, much xvac/cmin/cmax overlay is. (B) seems like a lot of overhead, but maybe cleaner. More research is needed though to check how it'd interact with HOT and how to get the locking right. (A) is IMHO the best solution for the SELECT-FOR-SHARE since it's very close to what we do today. Any comments? Especially of the "don't you dare" kind? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers