Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Peter Geoghegan Mon, 06 Nov 2023 15:03:44 -0800

On Mon, Nov 6, 2023 at 1:28 PM Matthias van de Meent
<boekewurm+postg...@gmail.com> wrote:
> I'm planning on reviewing this patch tomorrow, but in an initial scan
> through the patch I noticed there's little information about how the
> array keys state machine works in this new design. Do you have a more
> toplevel description of the full state machine used in the new design?


This is an excellent question. You're entirely right: there isn't
enough information about the design of the state machine.

In v1 of the patch, from all the way back in July, the "state machine"
advanced in the hackiest way possible: via repeated "incremental"
advancement (using logic from the function that we call
_bt_advance_array_keys() on HEAD) in a loop -- we just kept doing that
until the function I'm now calling _bt_tuple_before_array_skeys()
eventually reported that the array keys were now sufficiently
advanced. v2 greatly improved matters by totally overhauling
_bt_advance_array_keys(): it was taught to use binary searches to
advance the array keys, with limited remaining use of "incremental"
array key advancement.

However, version 2 (and all later versions to date) have somewhat
wonky state machine transitions, in one important respect: calls to
the new _bt_advance_array_keys() won't always advance the array keys
to the maximum extent possible (possible while still getting correct
behavior, that is). There were still various complicated scenarios
involving multiple "required" array keys (SK_BT_REQFWD + SK_BT_REQBKWD
scan keys that use BTEqualStrategyNumber), where one single call to
_bt_advance_array_keys() would advance the array keys to a point that
was still < caller's tuple. AFAICT this didn't cause wrong answers to
queries (that would require failing to find a set of exactly matching
array keys where a matching set exists), but it was kludgey. It was
sloppy in roughly the same way as the approach in my v1 prototype was
sloppy (just to a lesser degree).

I should be able to post v6 later this week. My current plan is to
commit the other nbtree patch first (the backwards scan "boundary
cases" one from the ongoing CF) -- since I saw your review earlier
today. I think that you should probably wait for this v6 before
starting your review. The upcoming version will have simple
preconditions and postconditions for the function that advances the
array key state machine (the new _bt_advance_array_keys). These are
enforced by assertions at the start and end of the function. So the
rules for the state machine become crystal clear and fairly easy to
keep in your head (e.g., tuple must be >= required array keys on entry
and <= required array keys on exit, the array keys must always either
advance by one increment or be completely exhausted for the top-level
scan in the current scan direction).

Unsurprisingly, I found that adding and enforcing these invariants led
to a simpler and more general design within _bt_advance_array_keys.
That code is still the most complicated part of the patch, but it's
much less of a bag of tricks. Another reason for you to hold off for a
few more days.

-- 
Peter Geoghegan

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

Reply via email to