> Heikki Linnakangas wrote: > On 14.02.2011 20:10, Kevin Grittner wrote: >> Promotion of the lock granularity on the prior tuple is where we >> have problems. If the two tuple versions are in separate pages >> then the second UPDATE could miss the conflict. My first thought >> was to fix that by requiring promotion of a predicate lock on a >> tuple to jump straight to the relation level if nextVersionOfRow >> is set for the lock target and it points to a tuple in a different >> page. But that doesn't cover a situation where we have a heap >> tuple predicate lock which gets promoted to page granularity >> before the tuple is updated. To handle that we would need to say >> that an UPDATE to a tuple on a page which is predicate locked by >> the transaction would need to be promoted to relation granularity >> if the new version of the tuple wasn't on the same page as the old >> version. > > Yeah, promoting the original lock on the UPDATE was my first > thought too. > > Another idea is to duplicate the original predicate lock on the > first update, so that the original reader holds a lock on both row > versions. I think that would ultimately be simpler as we wouldn't > need the next-prior chains anymore. > > For example, suppose that transaction X is holding a predicate lock > on tuple A. Transaction Y updates tuple A, creating a new tuple B. > Transaction Y sees that X holds a lock on tuple A (or the page > containing A), so it acquires a new predicate lock on tuple B on > behalf of X. > > If the updater aborts, the lock on the new tuple needs to be > cleaned up, so that it doesn't get confused with later tuple that's > stored in the same physical location. We could store the xmin of > the tuple in the predicate lock to check for that. Whenever you > check for conflict, if the xmin of the lock doesn't match the xmin > on the tuple, you know that the lock belonged to an old dead tuple > stored in the same location, and can be simply removed as the tuple > doesn't exist anymore. > >> That said, the above is about eliminating false negatives from >> some corner cases which escaped notice until now. I don't think >> the changes described above will do anything to prevent the >> problems reported by YAMAMOTO Takashi. > > Agreed, it's a separate issue. Although if we change the way we > handle the read-update-update problem, the other issue might go > away too. > >> Unless I'm missing something, it sounds like tuple IDs are being >> changed or reused while predicate locks are held on the tuples. >> That's probably not going to be overwhelmingly hard to fix if we >> can identify how that can happen. I tried to cover HOT issues, but >> it seems likely I missed something. > > Storing the xmin of the original tuple would probably help with > that too. But it would be nice to understand and be able to > reproduce the issue first. I haven't been able to produce a test case yet, but I'm getting clear enough about the issue that I think I can see my way to a good fix. Even if I have a fix first, I'll continue to try to create a test to show the pre-fix failure (and post-fix success), to include in the regression suite. This is the sort of thing which is hard enough to hit that a regression could slip in because of some change and make it out in a release unnoticed if we aren't specifically testing for it. It's pretty clear that the issue is this -- we are cleaning up the predicate locks for a transaction when the transaction which read the data becomes irrelevant; but a heap tuple predicate locked by a transaction may be updated or deleted and become eligible for cleanup before the transaction become irrelevant. This can lead to cycles in the target links if a tuple ID is reused before the transaction is cleaned up. Since we may want to detect the rw-conflicts with the reading transaction based on later versions of a row after the tuple it read has been updated and the old tuple removed and its ID reused, Heikki's suggestion of predicate lock duplication in these cases goes beyond being a simple way to avoid the particular symptoms reported; it is actually necessary for correct behavior. One adjustment I'm looking at for it is that I think the tuple xmin can go in the lock target structure rather than the lock structure, because if the tuple ID has already been reused, it's pretty clear that there are no transactions which can now update the old tuple which had that ID, so any predicate locks attached to a target with the old xmin can be released and the target reused with the new xmin. This does, however, raise the question of what happens if the tuple lock has been promoted to a page lock and a tuple on that page is non-HOT updated. (Clearly, it's not an issue if the locks on a heap relation for a transaction have been promoted all the way to the relation granularity, but it's equally clear we don't want to jump to that granularity when we can avoid it.) It seems to me that in addition to what Heikki suggested, we need to create a predicate lock on the new tuple or its (new) page for every transaction holding a predicate lock on the old page whenever a tuple on a locked page is updated. We won't know whether the updated tuple is one that was read, or by which of the transactions holding page locks, but we need to act as though it was read by all of them to prevent false negatives. Covering this just at the tuple level doesn't seem adequate. It looks like these changes will be very localized and unlikely to break other things. We're talking about corner cases which haven't surfaced in the hundreds of existing SSI regression tests, and for which it is hard to actually create a reproducible test case. The HOT updates tended to expose the issues because of the more aggressive cleanup in HOT pruning. That's going to be important to exploit in creating a reproducible test case. I'm proceeding on this basis. Any comments welcome. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers