On Sun, Oct 15, 2023 at 1:50 PM Peter Geoghegan <p...@bowt.ie> wrote: > Attached is v4, which applies cleanly on top of HEAD. This was needed > due to Alexandar Korotkov's commit e0b1ee17, "Skip checking of scan > keys required for directional scan in B-tree". > > Unfortunately I have more or less dealt with the conflicts on HEAD by > disabling the optimization from that commit, for the time being.
Attached is v5, which deals with the conflict with the optimization added by Alexandar Korotkov's commit e0b1ee17 sensibly: the optimization is now only disabled in cases without array scan keys. (It'd be very hard to make it work with array scan keys, since an important principle for my patch is that we can change search-type scan keys right in the middle of any _bt_readpage() call). v5 also fixes a longstanding open item for the patch: we no longer call _bt_preprocess_keys() with a buffer lock held, which was a bad idea at best, and unsafe (due to the syscache lookups within _bt_preprocess_keys) at worst. A new, minimal version of the function (called _bt_preprocess_keys_leafbuf) is called at the same point instead. That change, combined with the array binary search stuff (which was added back in v2), makes the total amount of work performed with a buffer lock held totally reasonable in all cases. It's even okay in extreme or adversarial cases with many millions of array keys. Making this _bt_preprocess_keys_leafbuf approach work has a downside: it requires that _bt_preprocess_keys be a little less aggressive about removing redundant scan keys, in order to meet certain assumptions held by the new _bt_preprocess_keys_leafbuf function. Essentially, _bt_preprocess_keys must now worry about current and future array key values when determining redundancy among scan keys -- not just the current array key values. _bt_preprocess_keys knows nothing about SK_SEARCHARRAY scan keys on HEAD, because on HEAD there is a strict 1:1 correspondence between the number of primitive index scans and the number of array keys (actually, the number of distinct combinations of array keys). Obviously that's no longer the case with the patch (that's the whole point of the patch). It's easiest to understand how elimination of redundant quals needs to work in v5 by way of an example. Consider the following query: select count(*), two, four, twenty, hundred from tenk1 where two in (0, 1) and four in (1, 2, 3) and two < 1; Notice that "two" appears in the where clause twice. First it appears as an SAOP, and then as an inequality. Right now, on HEAD, the primitive index scan where the SAOP's scankey is "two = 0" renders "two < 1" redundant. However, the subsequent primitive index scan where "two = 1" does *not* render "two < 1" redundant. This has implications for the mechanism in the patch, since the patch will perform one big primitive index scan for all array constants, with only a single _bt_preprocess_keys call at the start of its one and only _bt_first call (but with multiple _bt_preprocess_keys_leafbuf calls once we reach the leaf level). The compromise that I've settled on in v5 is to teach _bt_preprocess_keys to *never* treat "two < 1" as redundant with such a query -- even though there is some squishy sense in which "two < 1" is indeed still redundant (for the first SAOP key of value 0). My approach is reasonably well targeted in that it mostly doesn't affect queries that don't need it. But it will add cycles to some badly written queries that wouldn't have had them in earlier Postgres versions. I'm not entirely sure how much this matters, but my current sense is that it doesn't matter all that much. This is the kind of thing that is hard to test and poorly tested, so simplicity is even more of a virtue than usual. Note that the changes to _bt_preprocess_keys in v5 *don't* affect how we determine if the scan has contradictory quals, which is generally more important. With contradictory quals, _bt_first can avoid reading any data from the index. OTOH eliminating redundant quals (i.e. the thing that v5 *does* change) merely makes evaluating index quals less expensive via preprocessing-away unneeded scan keys. In other words, while it's possible that the approach taken by v5 will add CPU cycles in a small number of cases, it should never result in more page accesses. -- Peter Geoghegan
v5-0001-Enhance-nbtree-ScalarArrayOp-execution.patch
Description: Binary data