On 14.02.2011 20:10, Kevin Grittner wrote:
Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages then
the second UPDATE could miss the conflict. My first thought was to
fix that by requiring promotion of a predicate lock on a tuple to
jump straight to the relation level if nextVersionOfRow is set for
the lock target and it points to a tuple in a different page. But
that doesn't cover a situation where we have a heap tuple predicate
lock which gets promoted to page granularity before the tuple is
updated. To handle that we would need to say that an UPDATE to a
tuple on a page which is predicate locked by the transaction would
need to be promoted to relation granularity if the new version of
the tuple wasn't on the same page as the old version.
Yeah, promoting the original lock on the UPDATE was my first thought too.
Another idea is to duplicate the original predicate lock on the first
update, so that the original reader holds a lock on both row versions. I
think that would ultimately be simpler as we wouldn't need the
next-prior chains anymore.
For example, suppose that transaction X is holding a predicate lock on
tuple A. Transaction Y updates tuple A, creating a new tuple B.
Transaction Y sees that X holds a lock on tuple A (or the page
containing A), so it acquires a new predicate lock on tuple B on behalf
of X.
If the updater aborts, the lock on the new tuple needs to be cleaned up,
so that it doesn't get confused with later tuple that's stored in the
same physical location. We could store the xmin of the tuple in the
predicate lock to check for that. Whenever you check for conflict, if
the xmin of the lock doesn't match the xmin on the tuple, you know that
the lock belonged to an old dead tuple stored in the same location, and
can be simply removed as the tuple doesn't exist anymore.
That said, the above is about eliminating false negatives from some
corner cases which escaped notice until now. I don't think the
changes described above will do anything to prevent the problems
reported by YAMAMOTO Takashi.
Agreed, it's a separate issue. Although if we change the way we handle
the read-update-update problem, the other issue might go away too.
Unless I'm missing something, it
sounds like tuple IDs are being changed or reused while predicate
locks are held on the tuples. That's probably not going to be
overwhelmingly hard to fix if we can identify how that can happen.
I tried to cover HOT issues, but it seems likely I missed something.
Storing the xmin of the original tuple would probably help with that
too. But it would be nice to understand and be able to reproduce the
issue first.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers