Re: Deleting older versions in unique indexes to avoid page splits

Heikki Linnakangas Mon, 04 Jan 2021 04:08:45 -0800

On 02/12/2020 00:18, Peter Geoghegan wrote:

On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas <hlinn...@iki.fi> wrote:

On 30/11/2020 21:50, Peter Geoghegan wrote:

+} TM_IndexDelete;

+} TM_IndexStatus;

Is it really significantly faster to have two arrays? If you had just
one array, each element would be only 12 bytes long. That's not much
smaller than TM_IndexDeletes, which is 8 bytes.


Yeah, but the swap operations really matter here. At one point I found
that to be the case, anyway. That might no longer be true, though. It
might be that the code became less performance critical because other
parts of the design improved, resulting in the code getting called
less frequently. But if that is true then it has a lot to do with the
power-of-two bucketing that you go on to ask about -- that helped
performance a lot in certain specific cases (as I go into below).

I will add a TODO item for myself, to look into this again. You may
well be right.

+     /* First sort caller's array by TID */
+     heap_tid_shellsort(delstate->deltids, delstate->ndeltids);

Instead of sorting the array by TID, wouldn't it be enough to sort by
just the block numbers?


I don't understand. Yeah, I guess that we could do our initial sort of
the deltids array (the heap_tid_shellsort() call) just using
BlockNumber (not TID). But OTOH there might be some small locality
benefit to doing a proper TID sort at the level of each heap page. And
even if there isn't any such benefit, does it really matter?

You said above that heap_tid_shellsort() is very performance critical,and that's why you use the two arrays approach. If it's so performancecritical that swapping 8 bytes vs 12 byte array elements makes adifference, I would guess that comparing TID vs just the block numberswould also make a difference.

If you really want to shave cycles, you could also store BlockNumber andOffsetNumber in the TM_IndexDelete array, instead of ItemPointerData.What's the difference, you might ask? ItemPointerData stores the blocknumber as two 16 bit integers that need to be reassembled into a 32 bitinteger in the ItemPointerGetBlockNumber() macro.

My argument here is two-pronged: If this is performance critical, youcould do these additional optimizations. If it's not, then you don'tneed the two-arrays trick; one array would be simpler.

I'm not sure how performance critical this really is, or how manyinstructions each of these optimizations might shave off, so of course Imight be wrong and the tradeoff you have in the patch now really is thebest. However, my intuition would be to use a single array, becausethat's simpler, and do all the other optimizations instead: store thetid in the struct as separate BlockNumber and OffsetNumber fields, andsort only by the block number. Those optimizations are very deep in thelow level functions and won't add much to the mental effort ofunderstanding the algorithm at a high level.

        * While in general the presence of promising tuples (the hint that index
+      * AMs provide) is the best information that we have to go on, it is based
+      * on simple heuristics about duplicates in indexes that are understood to
+      * have specific flaws.  We should not let the most promising heap pages
+      * win or lose on the basis of _relatively_ small differences in the total
+      * number of promising tuples.  Small differences between the most
+      * promising few heap pages are effectively ignored by applying this
+      * power-of-two bucketing scheme.
+      *


What are the "specific flaws"?


I just meant the obvious: the index AM doesn't actually know for sure
that there are any old versions on the leaf page that it will
ultimately be able to delete. This uncertainty needs to be managed,
including inside heapam.c. Feel free to suggest a better way of
expressing that sentiment.

Hmm, maybe: "... is the best information that we have to go on, it isjust a guess based on simple heuristics about duplicates in indexes".

I understand that this is all based on rough heuristics, but is there
any advantage to rounding/bucketing the numbers like this? Even if small
differences in the total number of promising tuple is essentially noise
that can be safely ignored, is there any harm in letting those small
differences guide the choice? Wouldn't it be the essentially the same as
picking at random, or are those small differences somehow biased?


Excellent question! It actually helps enormously, especially with low
cardinality data that makes good use of the deduplication optimization
(where it is especially important to keep the costs and the benefits
in balance). This has been validated by benchmarking.

This design naturally allows the heapam.c code to take advantage of
both temporal and spatial locality. For example, let's say that you
have several indexes all on the same table, which get lots of non-HOT
UPDATEs, which are skewed. Naturally, the heap TIDs will match across
each index -- these are index entries that are needed to represent
successor versions (which are logically unchanged/version duplicate
index tuples from the point of view of nbtree/nbtdedup.c). Introducing
a degree of determinism to the order in which heap blocks are
processed naturally takes advantage of the correlated nature of the
index bloat. It naturally makes it much more likely that the
also-correlated bottom-up index deletion passes (that occur across
indexes on the same table) each process the same heap blocks close
together in time -- with obvious performance benefits.

In the extreme (but not particularly uncommon) case of non-HOT UPDATEs
with many low cardinality indexes, each heapam.c call will end up
doing *almost the same thing* across indexes. So we're making the
correlated nature of the bloat (which is currently a big problem) work
in our favor -- turning it on its head, you could say. Highly
correlated bloat is not the exception -- it's actually the norm in the
cases we're targeting here. If it wasn't highly correlated then it
would already be okay to rely on VACUUM to get around to cleaning it
later.

This power-of-two bucketing design probably also helps when there is
only one index. I could go into more detail on that, plus other
variations, but perhaps the multiple index example suffices for now. I
believe that there are a few interesting kinds of correlations here --
I bet you can think of some yourself. (Of course it's also possible
and even likely that heap block correlation won't be important at all,
but my response is "what specifically is the harm in being open to the
possibility?".)

I see. Would be good to explain that pattern with multiple indexes inthe comment.


- Heikki

Re: Deleting older versions in unique indexes to avoid page splits

Reply via email to