On 03/18/2015 06:23 PM, Robert Haas wrote:
On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <p...@heroku.com> wrote:
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the heap
tuple seems more straightforward. And we could put the speculative insertion
token in t_ctid, instead of stashing it in the PGPROC array. That would
again seem more straightforward.

I see the appeal of that approach. What concerns me about that
approach is that it makes it the problem of the inserter to confirm
its insertion, even though overwhelmingly, speculative insertions
succeeds (the race window is small due to the pre-check). The current
speculative token is legitimately a very short lived thing, a thing
that I hesitate to write to disk at all. So our optimistic approach to
inserting speculatively loses its optimism, which I don't like, if you
know what I mean.

Modifying a shared buffer is not the same as writing to disk.

AFAICS, there is no need to go and clear the tag after the insert has completed.

Here's what I had in mind: the inserter tags the tuple with the speculative insertion token, by storing the token in the t_ctid field. If the inserter needs to super-delete the tuple, it sets xmax like in a regular deletion, but also sets another flag to indicate that it was a super-deletion.

When another backend inserts, and notices that it has a potential conflict with the first tuple, it tries to acquire a hw-lock on the token. In most cases, the inserter has long since completed the insertion, and the acquisition succeeds immediately but you have to check because the token is not cleared on a completed insertion. It would be slightly faster for the second backend if the inserter actually removed the token after the insertion has completed: it wouldn't need to check the lock if the tuple was not tagged in the first place. But that'd be just a tiny optimization, for the rare case that there is a conflict, and surely not worth it.

Hmm, I just realized a little fly in that ointment: if the inserter inserts enough tuples to wrap around the token counter (4 billion?) in a single transaction, another backend that inserts could confuse a new speculative insertion with one that completed a long time ago. The risk seems small enough that we could just accept it. I don't think anything particularly bad would happen on a false positive here. Or we could also use all 48 bits available in the t_ctid to push it truly in the realm of ain't-gonna-happen. Or we could use (relfilenode, block, token) as the lock tag for the SpeculativeInsertionLock.

Regarding the physical layout: We can use a magic OffsetNumber value above MaxOffsetNumber to indicate that the t_ctid field stores a token rather than a regular ctid value. And another magic t_ctid value to indicate that a tuple has been super-deleted. The token and the super-deletion flag are quite ephemeral, they are not needed after the inserting transaction has completed, so it's nice to not consume the valuable infomask bits for these things. Those states are conveniently not possible on an updated tuple, when we would need the t_ctid field for it's current purpose.

If I'm reading this correctly, if there are multiple indexes that match the
unique index inference specification, we pick the cheapest one. Isn't that
unstable? Two backends running the same INSERT ON CONFLICT statement might
pick different indexes, and the decision might change over time as the table
is analyzed. I think we should have a more robust rule. Could we easily just
use all matching indexes?

Robert feel pretty strongly that there should be a costing aspect to
this. Initially I was skeptical of this, but just did what he said all
the same. But I've since come around to his view entirely because we
now support partial unique indexes.

I think Heikki's concern is something different, although I am not
altogether up to speed on this and may be confused.  The issue is:
suppose that process A and process B are both furiously upserting into
the same table with the same key column but, because they have
different costing parameters, process A consistently chooses index X
and process B consistently chooses index Y.  In that situation, will
we deadlock, livelock, error out, bloat, or get any other undesirable
behavior, or is that A-OK?

Right, that's what I had in mind.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to