On Tue, Nov 28, 2023 at 3:52 PM Peter Geoghegan <p...@bowt.ie> wrote: > While I still think that we need heuristics that apply speculative > criteria to decide whether or not going to the next page directly > (when we have a choice at all), that doesn't mean that the v7 > heuristics can't be improved on, with a little more thought. It's a > bit tricky, since we're probably also benefiting from the same > heuristics all the time -- probably even for this same test case.
Correction: this particular test case happens to be one where the optimal strategy is to do *exactly* what the master branch does currently. The master branch is unbeatable, so the only reasonable goal for the patch is to not lose (or to lose by no more than a completely negligible amount). I'm now prepared to say that this behavior is not okay -- I definitely need to fix this. It's a bug. Because each distinct value never fits on one leaf page (it's more like 1.5 - 2 pages, even though we're deduplicating heavily), and because Postgres 12 optimizations are so effective with low cardinality/posting-list-heavy indexes such as this, we're bound to lose quite often. The only reason it doesn't happen _every single time_ we descend the index is because the test script uses CREATE INDEX, rather than using retail inserts (I tend to prefer the latter for this sort of analysis). Since nbtsort.c isn't as clever/aggressive about suffix truncation as the nbtsplitloc.c split strategies would have been, had we used them (had there been retail inserts), many individual leaf pages are left with high keys that aren't particularly good targets for the high key precheck optimization (see Postgres 12 commit 29b64d1d). If I wanted to produce a truly adversarial case for this issue (which this is already close to), I'd go with the following: 1. Retail inserts that leave each leaf page full of one single value, which will allow each high key to still make a "clean break" from the right sibling page -- it'll have the right sibling's value. Maybe insert 1200 - 1300 tuples per distinct index value for this. In other words, bulk loading that results in an index that never has to append a heap TID tiebreaker during suffix truncation, but comes very close to needing to. Bulk loading where nbtsplitloc.c needs to use SPLIT_MANY_DUPLICATES all the time, but never quite gets to the point of needing a SPLIT_SINGLE_VALUE split. 2. A SAOP query with an array with every second value in the index as an element. Something like "WHERE arr in (2, 4, 6, 8, ...)". The patch will read every single leaf page, whereas master will *reliably* only read every second leaf page. I didn't need to "trick" the patch in a contrived sort of way to get this bad outcome -- this scenario is fairly realistic. So this behavior is definitely not something that I'm prepared to defend. As I said, it's a bug. It'll be fixed in the next revision. -- Peter Geoghegan