Attached test case shows a wrong answer bug affecting Postgres 17 and the master branch/18 devel. My nbtree SAOP commit (commit 5bf748b8) is to blame. The conditions under which the bug can produce incorrect behavior are rather narrow (more on those details below).
Attached fix addresses the issue by consistently resetting the scan's so->scanBehind flag (which might still be set to true from the previous page's high key) at the start of _bt_advance_array_keys. In particular, this will now happen during sktrig_required=false calls to _bt_advance_array_keys (and not just during sktrig_required=true calls). The problem really does boil down to a failure to reset so->scanBehind consistently -- it never made any sense to not always do that at the top of _bt_advance_array_keys like this. Here's the full, complicated explanation for why it is that not resetting so->scanBehind consistently leads to wrong answers when this test case is run against an unpatched server: The index scan must visit two leaf pages (with or without the fix in place). The first leaf page has all index tuples returned as expected. It also has a truncated leaf page high key, with truncated attributes that correspond to a required scan key (namely the scan key for "inequal_one_ten_range <= 10"). We set so->scanBehind at this point, which is needed to make it safe to continue onto the second leaf page (the alternative is to start another primitive index scan, but that would be inefficient). Next we arrive on the second leaf page. We have so->scanBehind set from before (from the high key for the first leaf page). The non-required scan lower-order key (namely the "nonrequired_equal_one_ten_range in (1, 2)" scan key) isn't satisfied right away, and so _bt_check_compare must perform a sktrig_required=false call to _bt_advance_array_keys. Unpatched servers won't reset so->scanBehind (i.e. set it 'false') at the top of _bt_advance_array_keys at this point, which leads to trouble later on (later on during the same call to _bt_advance_array_keys). Since _bt_advance_array_keys successfully finds a matching array element for the non-required scankey, it'll need a recheck call to _bt_check_compare. But, since so->scanBehind wasn't reset at the start of our call, it'll be allowed to suppress the code path where _bt_check_compare makes _bt_advance_array_keys return 'true'. It'll return 'false' instead, a bit later on -- which is wrong (ultimately _bt_checkkeys returns false to _bt_readpage as a result, which is wrong). The "suppress returning true" behavior was only ever intended to be used when advancing the scan's arrays using a page high key -- but this isn't the page high key (it's the second page's first non-pivot tuple, not the first page's high key/finaltup). It doesn't matter if we return false instead of true for the page high key in this way, since it isn't really supposed to be returned to the scan anyway. As I said, we need to forget what we saw when we looked at the last page's high key (the second page is a whole other page). -- Peter Geoghegan
v1-0001-Fix-failure-to-reset-nbtree-scanBehind-flag.patch
Description: Binary data
scanbehind-bug.sql
Description: Binary data