Bug in nbtree SAOP scans with non-required arrays, truncated high key

Peter Geoghegan Wed, 18 Dec 2024 12:21:22 -0800

Attached test case shows a wrong answer bug affecting Postgres 17 and
the master branch/18 devel. My nbtree SAOP commit (commit 5bf748b8) is
to blame. The conditions under which the bug can produce incorrect
behavior are rather narrow (more on those details below).


Attached fix addresses the issue by consistently resetting the scan's
so->scanBehind flag (which might still be set to true from the
previous page's high key) at the start of _bt_advance_array_keys. In
particular, this will now happen during sktrig_required=false calls to
_bt_advance_array_keys (and not just during sktrig_required=true
calls). The problem really does boil down to a failure to reset
so->scanBehind consistently -- it never made any sense to not always
do that at the top of _bt_advance_array_keys like this.

Here's the full, complicated explanation for why it is that not
resetting so->scanBehind consistently leads to wrong answers when this
test case is run against an unpatched server:

The index scan must visit two leaf pages (with or without the fix in place).

The first leaf page has all index tuples returned as expected. It also
has a truncated leaf page high key, with truncated attributes that
correspond to a required scan key (namely the scan key for
"inequal_one_ten_range <= 10"). We set so->scanBehind at this point,
which is needed to make it safe to continue onto the second leaf page
(the alternative is to start another primitive index scan, but that
would be inefficient).

Next we arrive on the second leaf page. We have so->scanBehind set
from before (from the high key for the first leaf page). The
non-required scan lower-order key (namely the
"nonrequired_equal_one_ten_range in (1, 2)" scan key) isn't satisfied
right away, and so _bt_check_compare must perform a
sktrig_required=false call to _bt_advance_array_keys. Unpatched
servers won't reset so->scanBehind (i.e. set it 'false') at the top of
_bt_advance_array_keys at this point, which leads to trouble later on
(later on during the same call to _bt_advance_array_keys).

Since _bt_advance_array_keys successfully finds a matching array
element for the non-required scankey, it'll need a recheck call to
_bt_check_compare. But, since so->scanBehind wasn't reset at the start
of our call, it'll be allowed to suppress the code path where
_bt_check_compare makes _bt_advance_array_keys return 'true'. It'll
return 'false' instead, a bit later on -- which is wrong (ultimately
_bt_checkkeys returns false to _bt_readpage as a result, which is
wrong).

The "suppress returning true" behavior was only ever intended to be
used when advancing the scan's arrays using a page high key -- but
this isn't the page high key (it's the second page's first non-pivot
tuple, not the first page's high key/finaltup). It doesn't matter if
we return false instead of true for the page high key in this way,
since it isn't really supposed to be returned to the scan anyway. As I
said, we need to forget what we saw when we looked at the last page's
high key (the second page is a whole other page).

-- 
Peter Geoghegan

v1-0001-Fix-failure-to-reset-nbtree-scanBehind-flag.patch
Description: Binary data

scanbehind-bug.sql
Description: Binary data

Bug in nbtree SAOP scans with non-required arrays, truncated high key

Reply via email to