On 12/20/2013 06:06 AM, Peter Geoghegan wrote:
On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan <p...@heroku.com> wrote:
Empirically, retrying because ExecInsertIndexTuples() returns some
recheckIndexes occurs infrequently, so maybe that makes all of this
okay. Or maybe it happens infrequently *because* we don't give up on
insertion when it looks like the current iteration is futile. Maybe
just inserting into every unique index, and then blocking on an xid
within ExecCheckIndexConstraints(), works out fairly and performs
reasonably in all common cases. It's pretty damn subtle, though, and I
worry about the worst case performance, and basic correctness issues
for these reasons.
I realized that it's possible to create the problem that I'd
previously predicted with "promise tuples" [1] some time ago, that are
similar in some regards to what Heikki has here. At the time, Robert
seemed to agree that this was a concern [2].
I have a very simple testcase attached, much simpler that previous
testcases, that reproduces deadlock for the patch
exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and
occasionally when scale=10 (for tiny, single-statement transactions).
With scale=100, I can't get it to deadlock on my laptop (60 clients in
all cases), at least in a reasonable time period. With the patch
btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock,
even with scale=1, simply because value locks are not held on to
across row locking. This is why I characterized the locking as
"opportunistic" on several occasions in months past.
The test-case is actually much simpler than the one I describe in [1],
and much simpler than all previous test-cases, as there is only one
unique index, though the problem is essentially the same. It is down
to old "value locks" held across retries - with "exclusion_...", we
can't *stop* locking things from previous locking attempts (where a
locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL
flag), because dirty snapshots still see
inserted-then-deleted-in-other-xact tuples. This deadlocking seems
unprincipled and unjustified, which is a concern that I had all along,
and a concern that Heikki seemed to share more recently [3]. This is
why I felt strongly all along that value locks ought to be cheap to
both acquire and _release_, and it's unfortunate that so much time was
wasted on tangential issues, though I do accept some responsibility
for that.
Hmm. If I understand the problem correctly, it's that as soon as another
backend sees the tuple you've inserted and calls XactLockTableWait(), it
will not stop waiting even if we later decide to kill the
already-inserted tuple.
One approach to fix that would be to release and immediately re-acquire
the transaction-lock, when you kill an already-inserted tuple. Then
teach the callers of XactLockTableWait() to re-check if the tuple is
still alive. I'm just waving hands here, but the general idea is to
somehow wake up others when you kill the tuple.
We could make use of that facility to also let others to proceed, if you
delete a tuple in the same transaction that you insert it. It's a corner
case, not worth much on its own, but I think it would fall out of the
above machinery for free, and be an easier way to test it than inducing
deadlocks with ON DUPLICATE.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers