On Wed, Nov 3, 2021 at 7:33 PM Peter Geoghegan <p...@bowt.ie> wrote: > But what about index-only scans, which GiST also supports? I think > that the rules are different there, even though index-only scans use > an MVCC snapshot.
(Reviving this old thread after 3 years) I was reminded of this old thread during today's discussion of a tangentially related TID-recycle-safety bug that affects bitmap index scans that use the visibility map as an optimization [1]. Turns out I was right to be concerned. This GiST bug is causally unrelated to that other bug, so I thought it would be helpful to move discussion of the GiST bug to this old thread. Attached is a refined version of a test case I posted earlier on [2], decisively proving that GiST index-only scans are in fact subtly broken. Right now it fails, showing a wrong answer to a query. The patch adds an isolationtest test case to btree_gist, based on a test case of Andres'. Offhand, I think that the only way to fix this is to bring GiST in line with nbtree: we ought to teach GiST VACUUM to start acquiring cleanup locks (previously known as super-exclusive locks), and then teach GiST index-only scans to hold onto a leaf page buffer pin as the visibility map (or the heap proper) is accessed for the TIDs to be returned from the leaf page. Arguably, GiST isn't obeying the current contract for amgettuple index AMs at all right now (whether or not it violates the contract as written is open to interpretation, I suppose, but either way the current behavior is wrong). We probably shouldn't hold onto a buffer pin during plain GiST index-only scans, though -- plain GiST index scans *aren't* broken, and so we should change as little as possible there. More concretely, we should probably copy more nbtree scan related code into GiST to deal with all this: we could copy nbtree's _bt_drop_lock_and_maybe_pin into GiST to fix this bug, while avoiding changing the performance characteristics of GiST plain index scans. This will also entail adding a new buffer field to GiST's GISTScanOpaqueData struct -- something similar to nbtree's BTScanOpaqueData.currPos.buf field (it'll go next to the current GISTScanOpaqueData.curBlkno field, just like the nbtree equivalent goes next to its own currPage blkno field). Long term, code like nbtree's _bt_drop_lock_and_maybe_pin should be generalized and removed from every individual index AM, nbtree included -- I think that the concepts generalize to every index AM that supports amgettuple (the race condition in question has essentially nothing to do with individual index AM requirements). I've discussed this kind of approach with Tomas Vondra (CC'd) recently. That's not going to be possible within the scope of a backpatchable fix, though. [1] https://postgr.es/m/873c33c5-ef9e-41f6-80b2-2f5e11869...@garret.ru [2] https://postgr.es/m/CAH2-Wzm6gBqc99iEKO6540ynwpjOqWESt5yjg-bHbt0hc8DPsw%40mail.gmail.com -- Peter Geoghegan
v2-0001-isolationtester-showing-broken-index-only-scans-w.patch
Description: Binary data