On 01/11/2023 00:08, Matthias van de Meent wrote:
calling _bt_compare in _bt_moveright in many common cases. This patch,
when stacked on top of the prefix truncation patch, improves INSERT
performance by an additional 2-9%pt, with an extreme case of 45% in
the worscase index tests at [0].

The optimization is that we now recognze that our page split algorithm
all but guarantees that the HIKEY matches this page's downlink's right
separator key bytewise, excluding the data stored in the
IndexTupleData struct.

Good observation.

By caching the right separator index tuple in _bt_search, we can
compare the downlink's right separator and the HIKEY, and when they
are equal (memcmp() == 0) we don't have to call _bt_compare - the
HIKEY is known to be larger than the scan key, because our key is
smaller than the right separator, and thus transitively also smaller
than the HIKEY because it contains the same data. As _bt_compare can
call expensive user-provided functions, this can be a large
performance boon, especially when there are only a small number of
column getting compared on each page (e.g. index tuples of many 100s
of bytes, or dynamic prefix truncation is enabled).

What would be the worst case scenario for this? One situation where the memcmp() would not match is when there is a concurrent page split. I think it's OK to pessimize that case. Are there any other situations? When the memcmp() matches, I think this is almost certainly not slower than calling the datatype's comparison function.

                if (offnum < PageGetMaxOffsetNumber(page))
                {
                        ItemId  rightsepitem = PageGetItemId(page, offnum + 1);
                        IndexTuple pagerightsep = (IndexTuple) 
PageGetItem(page, rightsepitem);
                        memcpy(rsepbuf.data, pagerightsep, 
ItemIdGetLength(rightsepitem));
                        rightsep = &rsepbuf.tuple;
                }
                else if (!P_RIGHTMOST(opaque))
                {
                        /*
                         * The rightmost data tuple on inner page has P_HIKEY 
as its right
                         * separator.
                         */
                        ItemId  rightsepitem = PageGetItemId(page, P_HIKEY);
                        IndexTuple pagerightsep = (IndexTuple) 
PageGetItem(page, rightsepitem);
                        memcpy(rsepbuf.data, pagerightsep, 
ItemIdGetLength(rightsepitem));
                        rightsep = &rsepbuf.tuple;
                }

This could use a one-line comment above this, something like "Remember the right separator of the downlink we follow, to speed up the next _bt_moveright call".

Should there be an "else rightsep = NULL;" here? Is it possible that we follow the non-rightmost downlink on a higher level and rightmost downlink on next level? Concurrent page deletion?

Please update the comment above _bt_moveright to describe the new argument. Perhaps the text from README should go there, this feels like a detail specific to _bt_search and _bt_moveright.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to