On Mon, Nov 21, 2016 at 5:42 PM, Peter Geoghegan <p...@heroku.com> wrote: >>> Or, you might want to make >>> sure that B-Tree tuple insertions still find that "first page" to >>> check, despite still generally treating heap item pointer as part of >>> the key proper (in unique indexes, it can still behave like NULL, and >>> not participate in uniqueness checking, while still essentially being >>> part of the key/sort order). >> >> It behaves like that (even though it's not coded like that) > > Why not? What do you mean? > > What I'm interested in evaluating is an implementation where an index > on (foo) has a sort order/key space that is precisely the same as an > index on (foo, ctid), with zero exceptions.
Well, the patch does the same sorting, but without explicitly adding the ctid to the scankey. When inserting into a unique index, the lookup for the insertion point proceeds as it would if the key was (foo, ctid), finding a page in the middle of the range that contain item pointers for foo. When that happens on a regular index, the insertion is done at that point and nothing else needs to be done. But when it happens on a unique index, the code first checks to see if the first item pointer for foo is on the same page (allegedly a frequent case). If so, the uniqueness check is done in a very straightforward and efficient manner. If not, however, the code retraces its steps up the tree to the first time it followed a key other than foo (that's the only way it could land at a middle page), and then follows the downlinks looking at a truncated key (just foo, ignoring ctid). Kind of what you say, but without the explicit null. > >>> It also occurs to me that our performance for updates in the event of >>> many NULL values in indexes is probably really bad, and could be >>> helped by this. You'll want to test all of this with a workload that >>> isn't very sympathetic to HOT, of course -- most of these benefits are >>> seen when HOT doesn't help. >> >> I haven't really tested with an overabundance of NULLs, I'll add that >> to the tests. I have tested low-cardinality indexes though, but only >> for correctness. > > I think we'll have to model data distribution to evaluate the idea > well. After all, if there is a uniform distribution of values anyway, > you're going to end up with many dirty leaf pages. Whereas, in the > more realistic case where updates are localized, we can hope to better > amortize the cost of inserting index tuples, and recycling index > tuples. Quite possibly >> What do you mean with "first class part"? >> >> It's not included in the scankey for a variety of reasons, not the >> least of which is not breaking the interface for current uses, and >> because the tid type is quite limited in its capabilities ATM. Also, >> the heap TID is usually there on most AM calls that care about it (ie: >> insertions have it, of course), so adding it to the scankey also >> seemed redundant. > > I mean that insertions into indexes that are low cardinality should be > largely guided by TID comparisons. ... >> If not adding to the scankey, what do you mean by "first class" then? > > Just what I said about the sort order of an index on "(foo)" now > precisely matching what we'd get for "(foo, ctid)". It is the case in both versions of the patch > There are a couple > of tricky issues with that that you'd have to look out for, like > making sure that the high key continues to hold a real TID, which at a > glance looks like something that just happens already. I'm not sure > that we preserve the item pointer TID in all cases, since the current > assumption is that a high key's TID is just filler -- maybe we can > lose that at some point. I thought long about that, and inner pages don't need a real TID in their keys, as those keys only specify key space cutoff points and not real tids, and high tids in leaf pages are always real. I haven't had any issue with that, aside from the headaches resulting from thinking about that for protracted periods of time. > You should use amcheck to specifically verify that that happens > reliably in all cases. Presumably, its use of an insertion scankey > would automatically see the use of TID as a tie-breaker with patched > Postgres amcheck verification, and so amcheck will work for this > purpose unmodified. It's totally on my roadmap. I'll have to adapt it for the new index format though, it doesn't work without modification. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers