Re: [HACKERS] SSI bug?

Kevin Grittner Sat, 19 Feb 2011 13:30:26 -0800

> Heikki Linnakangas  wrote:
> On 14.02.2011 20:10, Kevin Grittner wrote:

>> Promotion of the lock granularity on the prior tuple is where we
>> have problems. If the two tuple versions are in separate pages
>> then the second UPDATE could miss the conflict. My first thought
>> was to fix that by requiring promotion of a predicate lock on a
>> tuple to jump straight to the relation level if nextVersionOfRow
>> is set for the lock target and it points to a tuple in a different
>> page. But that doesn't cover a situation where we have a heap
>> tuple predicate lock which gets promoted to page granularity
>> before the tuple is updated. To handle that we would need to say
>> that an UPDATE to a tuple on a page which is predicate locked by
>> the transaction would need to be promoted to relation granularity
>> if the new version of the tuple wasn't on the same page as the old
>> version.
>
> Yeah, promoting the original lock on the UPDATE was my first
> thought too.
>
> Another idea is to duplicate the original predicate lock on the
> first update, so that the original reader holds a lock on both row
> versions. I think that would ultimately be simpler as we wouldn't
> need the next-prior chains anymore.
>
> For example, suppose that transaction X is holding a predicate lock
> on tuple A. Transaction Y updates tuple A, creating a new tuple B.
> Transaction Y sees that X holds a lock on tuple A (or the page
> containing A), so it acquires a new predicate lock on tuple B on
> behalf of X.
>
> If the updater aborts, the lock on the new tuple needs to be
> cleaned up, so that it doesn't get confused with later tuple that's
> stored in the same physical location. We could store the xmin of
> the tuple in the predicate lock to check for that. Whenever you
> check for conflict, if the xmin of the lock doesn't match the xmin
> on the tuple, you know that the lock belonged to an old dead tuple
> stored in the same location, and can be simply removed as the tuple
> doesn't exist anymore.
>
>> That said, the above is about eliminating false negatives from
>> some corner cases which escaped notice until now. I don't think
>> the changes described above will do anything to prevent the
>> problems reported by YAMAMOTO Takashi.
>
> Agreed, it's a separate issue. Although if we change the way we
> handle the read-update-update problem, the other issue might go
> away too.
>
>> Unless I'm missing something, it sounds like tuple IDs are being
>> changed or reused while predicate locks are held on the tuples.
>> That's probably not going to be overwhelmingly hard to fix if we
>> can identify how that can happen. I tried to cover HOT issues, but
>> it seems likely I missed something.
>
> Storing the xmin of the original tuple would probably help with
> that too. But it would be nice to understand and be able to
> reproduce the issue first.

I haven't been able to produce a test case yet, but I'm getting clear
enough about the issue that I think I can see my way to a good fix.
Even if I have a fix first, I'll continue to try to create a test to
show the pre-fix failure (and post-fix success), to include in the
regression suite.  This is the sort of thing which is hard enough to
hit that a regression could slip in because of some change and make
it out in a release unnoticed if we aren't specifically testing for
it.

It's pretty clear that the issue is this -- we are cleaning up the
predicate locks for a transaction when the transaction which read the
data becomes irrelevant; but a heap tuple predicate locked by a
transaction may be updated or deleted and become eligible for cleanup
before the transaction become irrelevant.  This can lead to cycles in
the target links if a tuple ID is reused before the transaction is
cleaned up.

Since we may want to detect the rw-conflicts with the reading
transaction based on later versions of a row after the tuple it read
has been updated and the old tuple removed and its ID reused,
Heikki's suggestion of predicate lock duplication in these cases goes
beyond being a simple way to avoid the particular symptoms reported;
it is actually necessary for correct behavior.  One adjustment I'm
looking at for it is that I think the tuple xmin can go in the lock
target structure rather than the lock structure, because if the tuple
ID has already been reused, it's pretty clear that there are no
transactions which can now update the old tuple which had that ID, so
any predicate locks attached to a target with the old xmin can be
released and the target reused with the new xmin.

This does, however, raise the question of what happens if the tuple
lock has been promoted to a page lock and a tuple on that page is
non-HOT updated.  (Clearly, it's not an issue if the locks on a heap
relation for a transaction have been promoted all the way to the
relation granularity, but it's equally clear we don't want to jump to
that granularity when we can avoid it.)

It seems to me that in addition to what Heikki suggested, we need to
create a predicate lock on the new tuple or its (new) page for every
transaction holding a predicate lock on the old page whenever a tuple
on a locked page is updated.  We won't know whether the updated tuple
is one that was read, or by which of the transactions holding page
locks, but we need to act as though it was read by all of them to
prevent false negatives.  Covering this just at the tuple level
doesn't seem adequate.

It looks like these changes will be very localized and unlikely to
break other things.  We're talking about corner cases which haven't
surfaced in the hundreds of existing SSI regression tests, and for
which it is hard to actually create a reproducible test case.  The
HOT updates tended to expose the issues because of the more
aggressive cleanup in HOT pruning.  That's going to be important to
exploit in creating a reproducible test case.

I'm proceeding on this basis.  Any comments welcome.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI bug?

Reply via email to