On Thu, Nov 7, 2024 at 2:19 PM Peter Geoghegan <p...@bowt.ie> wrote: > It would be weird to depend on the > currPos state from the last _bt_readpage call, even after seizing the > scan for the next page in line. If the scan has already finished > (according to moreLeft/moreRight), then why even try to seize the scan > once again? Why not just end the scan instead (including its call to > _bt_parallel_done)?
The problem with what I said here was that v1 didn't go far enough in this direction. v1 would still needlessly seize the scan whenever _bt_steppage's blkno was involved. This was still correct, I think -- but it wasn't clear. Attached revision v2 pushes the fix further in this direction. It explicitly documents that parallel _bt_readnextpage callers are allowed to use their so->currPos state (including blkno/nextPage/prevPage) to help _bt_readnextpage to decide whether it can end the scan right away. However, parallel index scan callers still won't be allowed to use this same so->currPos state to decide what page to go to next -- _bt_readnextpage really will need to seize the scan to get an authoritative blkno to make that part safe (actually, one parallel scan _bt_readnextpage caller still seizes the scan for itself, which is the one caller where _bt_readnextpage fully trusts caller's blkno + lastcurrblkno). I think that this approach is a win for readability. It makes the parallel case closer to the serial case, which is generally a good thing: it allows _bt_steppage to not care at all about whether or not the scan is a parallel scan. More importantly, it makes the exact extent to which parallel scans rely on their so->currPos state a great deal clearer. That was the underlying problem here all along: confusion about using the so->currPos state to end the scan vs. using the so->currPos state to decide which leaf page to visit next. These are two different things, and only the first one is generally safe. One problem that my v2 approach created (at least when I first started work on it) was that the new _bt_readnextpage loop code was ill-prepared for a blkno that's P_NONE when we seize the scan -- the part of the loop where that's checked for came first, before the scan is seized. I could have fixed that new problem by adding an extra defensive is-blkno-P_NONE code to _bt_readnextpage. But that's not what I settled on for v2. It makes more sense to just stop allowing the parallel scan's btps_nextScanPage to become P_NONE in the first place. The fact that we ever that never made much sense to me -- why wouldn't we just stop the scan as soon as we reach the point where the "next block" is P_NONE, which is only possible on the rightmost and leftmost page? So that's what we do. It's simpler all around. -- Peter Geoghegan
v2-0001-Fix-confusion-with-nbtree-parallel-scan-currPos-s.patch
Description: Binary data