On Mon, Jan 22, 2024 at 4:13 PM Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan <p...@bowt.ie> wrote: > And this is where I disagree with your (percieved) implicit argument > that this should be and always stay this way.
I never said that, and didn't intend to imply it. As I outlined to you back in November, my general philosophy around APIs (such as the index AM API) is that ambiguity about the limits and extent of an abstract interface isn't necessarily a bad thing [1]. It can actually be a good thing! Ever hear of Hyrum's Law? Abstractions are very often quite leaky. APIs like the index AM API inevitably make trade-offs. Trade-offs are almost always political, in one way or another. Litigating every possible question up-front requires knowing ~everything before you really get started. This mostly ends up being a waste of time, since many theoretical contentious trade-offs just won't matter one bit in the long run, for one reason or another (not necessarily because there is only ever one consumer of an API, for all sorts of reasons). You don't have to agree with me. That's just what my experience indicates works best on average, and in the long run. I cannot justify it any further than that. > By keeping that expectation alive, > this becomes a self-fulfilling prophecy, and I really don't like such > restrictive self-fulfilling prophecies. It's nice that we have index > AM feature flags, but if we only effectively allow one implementation > for this set of flags by ignoring potential other users when changing > behavior, then what is the API good for (apart from abstraction > layering, which is nice)? I explicitly made it clear that I don't mean that. > I think that may be right, but I could very well have built a > btree-lite that doesn't have the historical baggage of having to deal > with pages from before v12 (version 4) and some improvements that > haven't made it to core yet. Let me know if you ever do that. Let me know what the problems you encounter are. I'm quite happy to revise my position on this in light of new information. I change my mind all the time! > Something like the following, maybe? > > """ > Compatibility: The planner will now generate paths with array scan > keys in any column for ordered indexes, rather than on only a prefix > of the index columns. The planner still expects fully ordered data > from those indexes. > Historically, the btree AM couldn't output correctly ordered data for > suffix array scan keys, which was "fixed" by prohibiting the planner > from generating array scan keys without an equality prefix scan key up > to that attribute. In this commit, the issue has been fixed, and the > planner restriction has thus been removed as the only in-core IndexAM > that reports amcanorder now supports array scan keys on any column > regardless of what prefix scan keys it has. > """ Even if I put something like this in the commit message, I doubt that Bruce would pick it up in anything like this form (I have my doubts about it happening no matter what wording is used, actually). I could include something less verbose, mentioning a theoretical risk to out-of-core amcanorder routines that coevolved with nbtree, inherited the same SAOP limitations, and then never got the same set of fixes. > patch-a is a trivial and clean implementation of mergesort, which > tends to reduce the total number of compare operations if both arrays > are of similar size and value range. It writes data directly back into > the main array on non-assertion builds, and with assertions it reuses > your binary search join on scratch space for algorithm validation. I'll think about this some more. > patch-b is an improved version of patch-a with reduced time complexity > in some cases: It applies exponential search to reduce work done where > there are large runs of unmatched values in either array, and does > that while keeping O(n+m) worst-case complexity, now getting a > best-case O(log(n)) complexity. This patch seems massively over-engineered, though. Over 200 new lines of code, all for a case that is only used when queries are written with obviously-contradictory quals? It's just not worth it. > > The recursive call to _bt_advance_array_keys is needed to deal with > > unsatisfied-and-required inequalities that were not detected by an > > initial call to _bt_check_compare, following a second retry call to > > _bt_check_compare. We know that this recursive call to > > _bt_advance_array_keys won't cannot recurse again because we've > > already identified which specific inequality scan key it is that's a > > still-unsatisfied inequality, following an initial round of array key > > advancement. > > So, it's a case of this? > > scankey: a > 1 AND a = ANY (1 (=current), 2, 3)) AND b < 4 AND b = ANY > (2 (=current), 3, 4) > tuple: a=2, b=4 I don't understand why your example starts with "scankey: a > 1" and uses redundant/contradictory scan keys (for both "a" and "b"). For a forward scan the > scan key won't be seen as required by _bt_advance_array_keys, which means that it cannot be relevant to the branch containing the recursive call to _bt_advance_array_keys. (The later branch that calls _bt_check_compare is another matter, but that doesn't call _bt_advance_array_keys recursively at all -- that's not what we're discussing.) I also don't get why you've added all of this tricky redundancy to the example you've proposed. That seems to make the example a lot more complicated, without any apparent advantage. This particular piece of code has nothing to do with redundant/contradictory scan keys. > We first match the 'a' inequality key, then find an array-key mismatch > breaking out, then move the array keys forward to a=2 (of ANY > (1,2,3)), b=4 (of ANY(2, 3, 4)); and now we have to recheck the later > b < 4 inequality key, as that required inequality key wasn't checked > because the earlier arraykey did not match in _bt_check_compare, > causing it to stop at the a=1 condition as opposed to check the b < 4 > condition? I think that that's pretty close, yes. Obviously, _bt_check_compare() is going to give up upon finding the most significant now-unsatisfied scan key (which must be a required scan key in cases relevant to the code in question) -- at that point we need to advance the array keys. If we go on to successfully advance all required arrays to values that all match the corresponding values from caller's tuple, we might still have to consider inequalities (that are required in the current scan direction). It might still turn out that the relevant second call to _bt_check_compare() (the first one inside _bt_advance_array_keys) still sets continuescan=false -- despite what we were able to do with the scan's array keys. At that point, the only possible explanation is that there is a required inequality that still isn't satisfied. We should use "beyond_end_advance" advancement to advance the array keys "incrementally" a second time (in a recursive call "against the same original tuple"). This can only happen when the value of each of two required scan keys (some equality scan key plus some later inequality scan key) both cease to be satisfied at the same time, *within the same tuple*. In practice this just doesn't happen very often. It could in theory be avoided altogether (without any behavioral change) by forcing _bt_check_compare to always assess every scan key, rather than giving up upon finding any unsatisfied scan key (though that would be very inefficient). It'll likely be much easier to see what I mean by considering a real example. See my test case for this, which I don't currently plan to commit: https://github.com/petergeoghegan/postgres/blob/saop-dynamic-skip-v10.0/src/test/regress/sql/dynamic_saop_advancement.sql#L3600 I think that only this test case is the only one that'll actually break when you just rip out the recursive _bt_advance_array_keys call. And it still won't give incorrect answers when it breaks -- it just accesses a single extra leaf page. As I went into a bit already, upthread, this recursive call business is a good example of making the implementation more complicated, in order to preserve interface simplicity and generality. I think that that's the right trade-off here, despite being kinda awkward. > If so, then this visual explanation helped me understand the point and > why it can't repeat more than once much better than all that text. > Maybe this can be integrated? An example seems like it'd be helpful as a code comment. Can you come up with a simplified one? > It was exactly why I started asking about the use of pointer > additions: _bt_merge_arrays is new code and I can't think of any other > case of this style being used in new code, except maybe when > surrounding code has this style. The reason I first highlighted > _bt_update_keys_with_arraykeys is because it had a clear case of 2 > different styles in the same function. Meh. [1] https://postgr.es/m/cah2-wzmwn2_es_4rwy90drzc-nw-opononr6pwmqy+oouvv...@mail.gmail.com -- Peter Geoghegan