On Mon, Dec 30, 2013 at 7:22 AM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > Ah, I didn't remember that thread. Yeah, apparently I proposed the exact > same design back then. Simon complained about the dead tuples being left > behind, but I don't think that's a big issue with the design we've been > playing around now; you only end up with dead tuples when two backends try > to insert the same value concurrently, which shouldn't happen very often.
Right, because you check first, which also has a cost, paid in CPU cycles and memory bandwidth, and buffer lock contention. As opposed to a cost almost entirely localized to inserters into a single leaf page per unique index for only an instant. You're checking *all* unique indexes. You call check_exclusion_or_unique_constraint() once per unique index (or EC), and specify to wait on the xact, at least until a conflict is found. So if you're waiting on an xact, your conclusion that earlier unique indexes had no conflicts could soon become totally obsolete. So for non-idiomatic usage, say like the usage Andres in particular cares about for MM conflict resolution, I worry about the implications of that. I'm not asserting that it's a problem, but it does seem like something that's quite hard to reason about. Maybe Andres can comment. >> In short, I think that my approach may be better because it doesn't >> conflate row locking with value locking (therefore making it easier >> to reason about, maintaining modularity), > > You keep saying that, but I don't understand what you mean. With your > approach, an already-inserted heap tuple acts like a value lock, just like > in my approach. You have the page-level locks on b-tree pages in addition to > that, but the heap-tuple based mechanism is there too. Yes, but that historic behavior isn't really value locking at all. That's very much like row locking, because there is a row, not the uncertain intent to try to insert a row. Provided your transaction commits and the client's transaction doesn't delete the row, the row is definitely there. For upsert, conflicts may well be the norm, not the exception. Value locking is the exclusive lock on the buffer held during _bt_check_unique(). I'm trying to safely extend that mechanism, to reach consensus among unique indexes, which to me seems the most logical and conservative approach. For idiomatic usage, it's only sensible for there to be a conflict on one unique index, known ahead of time. If you don't know where the conflict will be, then typically your DML statement is unpredictable, just like the MySQL feature. Otherwise, for MM conflict resolution, I think it makes sense to pick those conflicts off, one at a time, dealing with exactly one row per conflict. I mean, even with your approach, you're still not dealing with later conflicts in later unique indexes, right? The fact that you prevent conflicts on previously non-conflicting unique indexes only, and, I suppose, not later ones too, seems quite arbitrary. > Well, killing an old promise tuple is not cheap, but it shouldn't happen > often. In both approaches, what probably matters more is the overhead of the > extra heavy-weight locking. But this is all speculation, until we see some > benchmarks. Fair enough. We'll know more when we have fixed the exclusion constraint supporting patch, which will allow us to make a fair comparison. I'm working on it. Although I'd suggest that having dead duplicates in indexes where that's avoidable is a cost that isn't necessarily that easily characterized. I especially don't like that you're currently doing the UNIQUE_CHECK_PARTIAL deferred unique constraint thing of always inserting, continuing on for *all* unique indexes regardless of finding a duplicate. Whatever overhead my approach may imply around lock contention, clearly the cost to index scans is zero. The other thing is that if you're holding old "value locks" (i.e. previously inserted btree tuples, from earlier unique indexes) pending resolving a value conflict, you're holding those value locks indefinitely pending the completion of the other guy's xact, just in case there ends up being no conflict, which in general is unlikely. So in extreme cases, that could be the difference between waiting all day (on someone sitting on a lock that they very probably have no use for), and not waiting at all. > _bt_check_unique() is a modularity violation, agreed. Beauty is in the eye > of the beholder, I guess, but I don't see either patch making that any > better or worse. Clearly the way in which you envisage releasing locks to prevent unprincipled deadlocking implies that btree has to know more about the heap, and maybe even that the heap has to know something about btree, or at least about amcanunique AMs (including possible future amcanunique AMs that may or may not be well suited to implementing this the same way). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers