On 01/11/2023 00:08, Matthias van de Meent wrote:
calling _bt_compare in _bt_moveright in many common cases. This patch,
when stacked on top of the prefix truncation patch, improves INSERT
performance by an additional 2-9%pt, with an extreme case of 45% in
the worscase index tests at [0].
The optimization is that we now recognze that our page split algorithm
all but guarantees that the HIKEY matches this page's downlink's right
separator key bytewise, excluding the data stored in the
IndexTupleData struct.
Good observation.
By caching the right separator index tuple in _bt_search, we can
compare the downlink's right separator and the HIKEY, and when they
are equal (memcmp() == 0) we don't have to call _bt_compare - the
HIKEY is known to be larger than the scan key, because our key is
smaller than the right separator, and thus transitively also smaller
than the HIKEY because it contains the same data. As _bt_compare can
call expensive user-provided functions, this can be a large
performance boon, especially when there are only a small number of
column getting compared on each page (e.g. index tuples of many 100s
of bytes, or dynamic prefix truncation is enabled).
What would be the worst case scenario for this? One situation where the
memcmp() would not match is when there is a concurrent page split. I
think it's OK to pessimize that case. Are there any other situations?
When the memcmp() matches, I think this is almost certainly not slower
than calling the datatype's comparison function.
if (offnum < PageGetMaxOffsetNumber(page))
{
ItemId rightsepitem = PageGetItemId(page, offnum + 1);
IndexTuple pagerightsep = (IndexTuple)
PageGetItem(page, rightsepitem);
memcpy(rsepbuf.data, pagerightsep,
ItemIdGetLength(rightsepitem));
rightsep = &rsepbuf.tuple;
}
else if (!P_RIGHTMOST(opaque))
{
/*
* The rightmost data tuple on inner page has P_HIKEY
as its right
* separator.
*/
ItemId rightsepitem = PageGetItemId(page, P_HIKEY);
IndexTuple pagerightsep = (IndexTuple)
PageGetItem(page, rightsepitem);
memcpy(rsepbuf.data, pagerightsep,
ItemIdGetLength(rightsepitem));
rightsep = &rsepbuf.tuple;
}
This could use a one-line comment above this, something like "Remember
the right separator of the downlink we follow, to speed up the next
_bt_moveright call".
Should there be an "else rightsep = NULL;" here? Is it possible that we
follow the non-rightmost downlink on a higher level and rightmost
downlink on next level? Concurrent page deletion?
Please update the comment above _bt_moveright to describe the new
argument. Perhaps the text from README should go there, this feels like
a detail specific to _bt_search and _bt_moveright.
--
Heikki Linnakangas
Neon (https://neon.tech)