On Sat, Aug 20, 2016 at 1:05 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Aug 18, 2016 at 8:24 AM, Claudio Freire <klaussfre...@gmail.com> > wrote: >> >> A couple of points make me uneasy about this patch, yet I can think of >> no better alternative, so I seek feedback: >> >> - introducing a different format for inner index tuples makes for an >> invasive patch and quite difficult-to-reason code (it's easy to forget >> whether a page is leaf or inner and that now causes assertion failures >> or segfaults) >> - the ascent-descent to check for uniqueness when there are large >> dead tuple runs could have locking subtleties that escape me. Perhaps >> there's better ways to solve this. > > I have tried to study this part of your patch and it seems somewhat > non-trivial and risky part of proposal. > > + } else { > + /* > + * We found the actual first item on another block, so we have to perform > + * a two-step search - first half, until our write-locked buffer, then > another > + * starting from our write-locked buffer. > + */ > + LockBuffer(buf, BUFFER_LOCK_UNLOCK); > + LockBuffer(buf, BT_WRITE); > + > + buf = _bt_moveright(rel, buf, natts, itup_scankey, &(itup->t_tid), false, > + true, stack, BT_WRITE, NULL); > + > + first_offset = _bt_binsrch(rel, buf, natts, itup_scankey, NULL, false, > NULL); > + > + xwait = _bt_check_unique(rel, itup, heapRel, nbuf, buf, > first_offset, itup_scankey, > + checkUnique, &is_unique, &speculativeToken); > + > + _bt_relbuf(rel, nbuf); > + } > > The idea for uniqueness check is that we hold the write lock on > buffer/page on which we are trying to operate (we do take read locks > on the consecutive pages during this check). Here, in your changed > proposal, you have two buffers (one to which the search led you and > one buffer previous in the chain) and before checking uniqueness, on > one of the buffers you seem to have write lock and on other you have > read lock. This seems to change the way uniqueness check works till > now, can you explain how this works (can we safely assume that all > searches for uniqueness check will lead to the same buffer first).
All uniqueness checks will find the same "nbuf" (read-locked), which is the first one that matches the key. They could have different "buf" buffers (the write-locked) one. But since read locks cannot be held while a write-lock is held, I believe it doesn't matter. All uniqueness checks will try to read-lock nbuf unless the uniqueness check for that insertion is done. When they do, they will block until the other insertion is finished, and when that happens, either the insert happened, or it returned an xid to wait for and retry. If it succeeded, the check attempting the read-lock will see the inserted tuple and return the xid of the earlier transaction and wait on it. If it failed, no insert happened because there's a visible tuple there, and the read-lock will succeed, find the visible tuple, and the insert will fail too. In essence, it is still the case, I believe, that only one insert can succeed at a time, all the others will retry. It only happens that with the patch, the contention will be between a read lock and a write lock, and not two write locks, and there's a little less contention (some more work can be done in parallel). > With this new mechanism, do we have two type of search interfaces > where one would work for keys (as now) and other for key-ctid or it > will be just a single interface which works both ways? I think either > way there are pros and cons. The patch doesn't add another interface, but the idea is to add it. The implementation will probably fall on a common function that can do both, but some index ams can't implement the key-ctid operation (hash indexes), so they will have to be separate interfaces to be able to properly represent the separate capabilities (besides, other implementations may well use completely different paths for the two operations). > Also, in the thread below you are talking about using the last bit in > t_info, I want to bring it to your notice that there is a patch of > mine [1] in which I have used it to avoid changing on-disk > compatibility of hash indexes. I am not saying that we should not > plan it for some other use, but just to keep in mind that there are > other proposals to use it. We can decide what is best way to proceed, > if we are aware of all the proposals that want to use it. > > > [1] - > https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com I wasn't aware of that particular thread, but there's another (the WARM one) where there's another proposed use of that bit. IMHO, any use of the bit that doesn't leave room for further extensions of the bit space is a bad idea, because it closes the door on (easy) further flags being added. And the fact that there's already several proposals to use that bit for different purposes only reinforces that idea. The idea I'm working on allows further extension, and, although it uses more space, I believe it's a better way. But we'll discuss that when I have the implementation I guess. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers