Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Peter Geoghegan Tue, 28 Nov 2023 16:58:44 -0800

On Tue, Nov 28, 2023 at 3:52 PM Peter Geoghegan <p...@bowt.ie> wrote:
> While I still think that we need heuristics that apply speculative
> criteria to decide whether or not going to the next page directly
> (when we have a choice at all), that doesn't mean that the v7
> heuristics can't be improved on, with a little more thought. It's a
> bit tricky, since we're probably also benefiting from the same
> heuristics all the time -- probably even for this same test case.


Correction: this particular test case happens to be one where the
optimal strategy is to do *exactly* what the master branch does
currently. The master branch is unbeatable, so the only reasonable
goal for the patch is to not lose (or to lose by no more than a
completely negligible amount).

I'm now prepared to say that this behavior is not okay -- I definitely
need to fix this. It's a bug.

Because each distinct value never fits on one leaf page (it's more
like 1.5 - 2 pages, even though we're deduplicating heavily), and
because Postgres 12 optimizations are so effective with low
cardinality/posting-list-heavy indexes such as this, we're bound to
lose quite often. The only reason it doesn't happen _every single
time_ we descend the index is because the test script uses CREATE
INDEX, rather than using retail inserts (I tend to prefer the latter
for this sort of analysis). Since nbtsort.c isn't as clever/aggressive
about suffix truncation as the nbtsplitloc.c split strategies would
have been, had we used them (had there been retail inserts), many
individual leaf pages are left with high keys that aren't particularly
good targets for the high key precheck optimization (see Postgres 12
commit 29b64d1d).

If I wanted to produce a truly adversarial case for this issue (which
this is already close to), I'd go with the following:

1. Retail inserts that leave each leaf page full of one single value,
which will allow each high key to still make a "clean break" from the
right sibling page -- it'll have the right sibling's value. Maybe
insert 1200 - 1300 tuples per distinct index value for this.

In other words, bulk loading that results in an index that never has
to append a heap TID tiebreaker during suffix truncation, but comes
very close to needing to. Bulk loading where nbtsplitloc.c needs to
use SPLIT_MANY_DUPLICATES all the time, but never quite gets to the
point of needing a SPLIT_SINGLE_VALUE split.

2. A SAOP query with an array with every second value in the index as
an element. Something like "WHERE arr in (2, 4, 6, 8, ...)".

The patch will read every single leaf page, whereas master will
*reliably* only read every second leaf page. I didn't need to "trick"
the patch in a contrived sort of way to get this bad outcome -- this
scenario is fairly realistic. So this behavior is definitely not
something that I'm prepared to defend. As I said, it's a bug.

It'll be fixed in the next revision.

--
Peter Geoghegan

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Reply via email to