On 01/10/2014 05:36 AM, Peter Geoghegan wrote:
While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.

Nah, that's nothing. I have a patch in the January commitfest that was already posted for the previous commitfest. It received zero review back then, and still has no reviewer signed up, let alone anyone actually reviewing it. And arguably it's a bug fix!

http://www.postgresql.org/message-id/5285071b.1040...@vmware.com

Wink wink, if you're looking for patches to review... ;-)

  The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical.

Here's an updated patch. Hope it works now... This is based on an older version, and doesn't include any fixes from your latest btreelock_insert_on_dup.v7.2014_01_07.patch. Please check the common parts, and copy over any relevant changes.

The fix for the deadlocking issue consists of a few parts. First, there's a new heavy-weight lock type, a speculative insertion lock, which works somewhat similarly to XactLockTableWait(), but is only held for the duration of a single speculative insertion. When a backend is about to begin a speculative insertion, it first acquires the speculative insertion lock. When it's done with the insertion, meaning it has either cancelled it by killing the already-inserted tuple or decided that it's going to go ahead with it, the lock is released.

The speculative insertion lock is keyed by Xid, and token. The lock can be taken many times in the same transaction, and token's purpose is to distinguish which insertion is currently in progress. The token is simply a backend-local counter, incremented each time the lock is taken.

In addition to the heavy-weight lock, there are new fields in PGPROC to indicate which tuple the backend is currently inserting. When the tuple is inserted, the backend fills in the relation's relfilenode and item pointer in MyProc->specInsert* fields, while still holding the buffer lock. The current speculative insertion token is also stored there.

With that mechanism, when another backend sees a tuple whose xmin is still in progress, it can check if the insertion is a speculative insertion. To do that, scan the proc array, and find the backend with the given xid. Then, check that the relfilenode and itempointer in that backend's PGPROC slot match the tuple, and make note of the token the backend had advertised.

HeapTupleSatisfiesDirty() does the proc array check, and returns the token in the snapshot, alongside snapshot->xmin. The caller can then use that information in place of XactLockTableWait().


There would be other ways to skin the cat, but this seemed like the quickest to implement. One more straightforward approach would be to use the tuple's TID directly in the speculative insertion lock's key, instead of Xid+token, but then the inserter would have to grab the heavy-weight lock while holding the buffer lock, which seems dangerous. Another alternative would be to store token in the heap tuple header, instead of PGPROC; a tuple that's still being speculatively inserted has no xmax, so it could be placed in that field. Or ctid.

More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics.

Yes.

It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and*  it's
already looking for would-be duplicates*first*). I'm trying to be as
fair as possible, and yet the difference is huge.

*shrug*. I'm not too concerned about performance during contention. But let's see how this fixed version performs. Could you repeat the tests you did with this?

Any guesses what the bottleneck is? At a quick glance at a profile of a pgbench run with this patch, I didn't see anything out of ordinary, so I'm guessing it's lock contention somewhere.

- Heikki

Attachment: speculative-insertions-2014_01_10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to