On Wed, 5 Mar 2025 at 19:19, Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > > On Wed, 5 Mar 2025 at 10:04, Heikki Linnakangas <hlinn...@iki.fi> wrote: > > > > On 28/02/2025 03:53, Peter Geoghegan wrote: > > > On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev > > > <michail.nikol...@gmail.com> wrote: > > >> Just some commit messages + few cleanups. > > > > > > I'm worried about this: > > > > > > +These longer pin lifetimes can cause buffer exhaustion with messages > > > like "no > > > +unpinned buffers available" when the index has many pages that have > > > similar > > > +ordering; but future work can figure out how to best work that out. > > > > > > I think that we should have some kind of upper bound on the number of > > > pins that can be acquired at any one time, in order to completely > > > avoid these problems. Solving that problem will probably require GiST > > > expertise that I don't have right now. > > > > +1. With no limit, it seems pretty easy to hold thousands of buffer pins > > with this. > > > > The index can set IndexScanDesc->xs_recheck to indicate that the quals > > must be rechecked. Perhaps we should have a similar flag to indicate > > that the visibility must be rechecked.
Added as xs_visrecheck in 0001. > > Here's a completely different line of attack: Instead of holding buffer > > pins for longer, what if we checked the visibility map earlier? We could > > check the visibility map already when we construct the > > GISTSearchHeapItem, and set a flag in IndexScanDesc to tell > > IndexOnlyNext() that we have already done that. IndexOnlyNext() would > > have three cases: > > I don't like integrating a heap-specific thing like VM_ALL_VISIBLE() > to indexes, but given that IOS code already uses that exact code my > dislike is not to the point of a -1. I'd like it better if we had a > TableAM API for higher-level visibility checks (e.g. > table_tids_could_be_invisible?()) which gives us those responses > instead; dropping the requirement to maintain VM in pg's preferred > format to support efficient IOS. Here's a patchset that uses that approach. Naming of functions, types, fields and arguments TBD. The patch works and passes the new VACUUM-conflict tests, though I suspect the SP-GIST tests to have bugs, as an intermediate version of my 0003 patch didn't trigger the tests to fail, even though it did not hold a pin on (all) sorted items' data when it was being checked for visibility and/or returned from the scan. Patch 0001 details the important changes, while 0002/0003 use this new API to make GIST and SP-GIST's IOS work correctly when concurrent VACUUM is/was running. 0004 is the existing patch with tests (v8-0001). > With VM-checking in the index, we would potentially have another > benefit: By checking all tids on the page at once, we can deduplicate > and reduce the VM lookups. The gains might not be all that impressive, > but could be significant in certain hot cases. That is also included in this, but any performance impact hasn't been tested nor validated. Kind regards, Matthias van de Meent Neon (https://neon.tech)
From e1d0580fa7680dcdb7e03665daa0a02d657e9257 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <boekewurm+postgres@gmail.com> Date: Fri, 7 Mar 2025 17:39:23 +0100 Subject: [PATCH v9 1/4] IOS/TableAM: Support AM-supplied fast visibility checking Previously, we assumed VM_ALL_VISIBLE is universal across all AMs. This is probably not the case, so we introduce a new table method called "table_index_vischeck_tuples" which allows anyone to ask the AM whether a tuple is definitely visible to everyone or might be invisible to someone. The API is intended to replace direct calls to VM_ALL_VISIBLE and as such doesn't include "definitely dead to everyone", which would be too expensive for the Heap AM (and would require additional work in indexes to manage). A future commit will use this inside GIST and SP-GIST to fix a race condition between IOS and VACUUM, which causes a bug with tuple visibility. --- src/include/access/heapam.h | 2 + src/include/access/relscan.h | 5 ++ src/include/access/tableam.h | 57 ++++++++++++++++++ src/backend/access/heap/heapam.c | 63 +++++++++++++++++++ src/backend/access/heap/heapam_handler.c | 1 + src/backend/access/index/indexam.c | 6 ++ src/backend/access/table/tableamapi.c | 1 + src/backend/executor/nodeIndexonlyscan.c | 77 +++++++++++++++--------- src/backend/utils/adt/selfuncs.c | 76 ++++++++++++++--------- 9 files changed, 231 insertions(+), 57 deletions(-) diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 1640d9c32f7..a820f150509 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -378,6 +378,8 @@ extern void simple_heap_update(Relation relation, ItemPointer otid, extern TransactionId heap_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate); +extern void heap_index_vischeck_tuples(Relation rel, + TM_IndexVisibilityCheckOp *checkop); /* in heap/pruneheap.c */ struct GlobalVisState; diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h index dc6e0184284..759c9dd164e 100644 --- a/src/include/access/relscan.h +++ b/src/include/access/relscan.h @@ -26,6 +26,9 @@ struct ParallelTableScanDescData; +enum TMVC_Result; + + /* * Generic descriptor for table scans. This is the base-class for table scans, * which needs to be embedded in the scans of individual AMs. @@ -168,6 +171,8 @@ typedef struct IndexScanDescData bool xs_recheck; /* T means scan keys must be rechecked */ + int xs_visrecheck; /* TM_VisCheckResult from tableam.h */ + /* * When fetching with an ordering operator, the values of the ORDER BY * expressions of the last returned tuple, according to the index. If diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index 131c050c15f..8570b9589a6 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -255,6 +255,33 @@ typedef struct TM_IndexDeleteOp TM_IndexStatus *status; } TM_IndexDeleteOp; +/* + * Index-only scans require results to be known + */ +typedef enum TMVC_Result +{ + TMVC_Unchecked, + TMVC_MaybeInvisible, + TMVC_AllVisible, +} TMVC_Result; + +typedef struct TM_VisCheck +{ + ItemPointerData tid; + OffsetNumber idxoffnum; + TMVC_Result vischeckresult; +} TM_VisCheck; + +/* + * + */ +typedef struct TM_IndexVisibilityCheckOp +{ + int nchecktids; + Buffer *vmbuf; + TM_VisCheck *checktids; +} TM_IndexVisibilityCheckOp; + /* "options" flag bits for table_tuple_insert */ /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */ #define TABLE_INSERT_SKIP_FSM 0x0002 @@ -501,6 +528,10 @@ typedef struct TableAmRoutine TransactionId (*index_delete_tuples) (Relation rel, TM_IndexDeleteOp *delstate); + /* see table_index_delete_tuples() */ + void (*index_vischeck_tuples) (Relation rel, + TM_IndexVisibilityCheckOp *checkop); + /* ------------------------------------------------------------------------ * Manipulations of physical tuples. @@ -1364,6 +1395,32 @@ table_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate) return rel->rd_tableam->index_delete_tuples(rel, delstate); } +static inline void +table_index_vischeck_tuples(Relation rel, TM_IndexVisibilityCheckOp *checkop) +{ + return rel->rd_tableam->index_vischeck_tuples(rel, checkop); +} + +static inline TMVC_Result +table_index_vischeck_tuple(Relation rel, Buffer *vmbuffer, ItemPointer tid) +{ + TM_IndexVisibilityCheckOp checkOp; + TM_VisCheck op; + + op.idxoffnum = 0; + op.tid = *tid; + op.vischeckresult = TMVC_Unchecked; + checkOp.checktids = &op; + checkOp.nchecktids = 1; + checkOp.vmbuf = vmbuffer; + + rel->rd_tableam->index_vischeck_tuples(rel, &checkOp); + + Assert(op.vischeckresult != TMVC_Unchecked); + + return op.vischeckresult; +} + /* ---------------------------------------------------------------------------- * Functions for manipulations of physical tuples. diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index fa7935a0ed3..cd6264544bd 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -101,6 +101,7 @@ static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status uint16 infomask, Relation rel, int *remaining); static void index_delete_sort(TM_IndexDeleteOp *delstate); static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate); +static int heap_cmp_index_vischeck(const void *a, const void *b); static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup); static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required, bool *copy); @@ -8692,6 +8693,68 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate) return nblocksfavorable; } +/* + * heapam implementation of tableam's index_vischeck_tuples interface. + * + * This helper function is called by index AMs during index-only scans, + * to do VM-based visibility checks on individual tuples, so that the AM + * can hold the tuple in memory for e.g. reordering for extended periods of + * time while without holding thousands of pins to conflict with VACUUM. + * + * It's possible for this to generate a fair amount of I/O, since we may be + * checking hundreds of tuples from a single index block, but that is + * preferred over holding thousands of pins. + */ +void +heap_index_vischeck_tuples(Relation rel, TM_IndexVisibilityCheckOp *checkop) +{ + BlockNumber prevBlk = InvalidBlockNumber; + TMVC_Result lastResult = TMVC_Unchecked; + Buffer *vmbuf = checkop->vmbuf; + TM_VisCheck *checkTids = checkop->checktids; + + if (checkop->nchecktids > 1) + qsort(checkTids, checkop->nchecktids, sizeof(TM_VisCheck), + heap_cmp_index_vischeck); + /* + * XXX: In the future we should probably reorder these operations so + * we can apply the checks in block order, rather than index order. + */ + for (int i = 0; i < checkop->nchecktids; i++) + { + TM_VisCheck *check = &checkop->checktids[i]; + ItemPointer tid = &check->tid; + BlockNumber blkno = ItemPointerGetBlockNumber(tid); + + Assert(BlockNumberIsValid(blkno)); + Assert(check->vischeckresult == TMVC_Unchecked); + + if (blkno != prevBlk) + { + if (VM_ALL_VISIBLE(rel, blkno, vmbuf)) + lastResult = TMVC_AllVisible; + else + lastResult = TMVC_MaybeInvisible; + + prevBlk = blkno; + } + + check->vischeckresult = lastResult; + } +} + +/* + * Compare TM_VisChecks for an efficient ordering. + */ +static int +heap_cmp_index_vischeck(const void *a, const void *b) +{ + const TM_VisCheck *visa = (const TM_VisCheck *) a; + const TM_VisCheck *visb = (const TM_VisCheck *) b; + return ItemPointerCompare(unconstify(ItemPointerData *, &visa->tid), + unconstify(ItemPointerData *, &visb->tid)); +} + /* * Perform XLogInsert for a heap-visible operation. 'block' is the block * being marked all-visible, and vm_buffer is the buffer containing the diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c index e78682c3cef..26e3da04eb1 100644 --- a/src/backend/access/heap/heapam_handler.c +++ b/src/backend/access/heap/heapam_handler.c @@ -2667,6 +2667,7 @@ static const TableAmRoutine heapam_methods = { .tuple_tid_valid = heapam_tuple_tid_valid, .tuple_satisfies_snapshot = heapam_tuple_satisfies_snapshot, .index_delete_tuples = heap_index_delete_tuples, + .index_vischeck_tuples = heap_index_vischeck_tuples, .relation_set_new_filelocator = heapam_relation_set_new_filelocator, .relation_nontransactional_truncate = heapam_relation_nontransactional_truncate, diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 8b1f555435b..8b4fadc0743 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -583,6 +583,12 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction) /* XXX: we should assert that a snapshot is pushed or registered */ Assert(TransactionIdIsValid(RecentXmin)); + /* + * Reset xs_visrecheck, so we don't confuse the next tuple's visibility + * state with that of the previous. + */ + scan->xs_visrecheck = TMVC_Unchecked; + /* * The AM's amgettuple proc finds the next index entry matching the scan * keys, and puts the TID into scan->xs_heaptid. It should also set diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c index 760a36fd2a1..c14d01b3bb1 100644 --- a/src/backend/access/table/tableamapi.c +++ b/src/backend/access/table/tableamapi.c @@ -61,6 +61,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->tuple_get_latest_tid != NULL); Assert(routine->tuple_satisfies_snapshot != NULL); Assert(routine->index_delete_tuples != NULL); + Assert(routine->index_vischeck_tuples != NULL); Assert(routine->tuple_insert != NULL); diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c index e6635233155..d4dfc4f2456 100644 --- a/src/backend/executor/nodeIndexonlyscan.c +++ b/src/backend/executor/nodeIndexonlyscan.c @@ -120,6 +120,7 @@ IndexOnlyNext(IndexOnlyScanState *node) while ((tid = index_getnext_tid(scandesc, direction)) != NULL) { bool tuple_from_heap = false; + TMVC_Result vischeck = scandesc->xs_visrecheck; CHECK_FOR_INTERRUPTS(); @@ -157,36 +158,56 @@ IndexOnlyNext(IndexOnlyScanState *node) * It's worth going through this complexity to avoid needing to lock * the VM buffer, which could cause significant contention. */ - if (!VM_ALL_VISIBLE(scandesc->heapRelation, - ItemPointerGetBlockNumber(tid), - &node->ioss_VMBuffer)) - { - /* - * Rats, we have to visit the heap to check visibility. - */ - InstrCountTuples2(node, 1); - if (!index_fetch_heap(scandesc, node->ioss_TableSlot)) - continue; /* no visible tuple, try next index entry */ - - ExecClearTuple(node->ioss_TableSlot); - - /* - * Only MVCC snapshots are supported here, so there should be no - * need to keep following the HOT chain once a visible entry has - * been found. If we did want to allow that, we'd need to keep - * more state to remember not to call index_getnext_tid next time. - */ - if (scandesc->xs_heap_continue) - elog(ERROR, "non-MVCC snapshots are not supported in index-only scans"); + if (vischeck == TMVC_Unchecked) + vischeck = table_index_vischeck_tuple(scandesc->heapRelation, + &node->ioss_VMBuffer, + tid); - /* - * Note: at this point we are holding a pin on the heap page, as - * recorded in scandesc->xs_cbuf. We could release that pin now, - * but it's not clear whether it's a win to do so. The next index - * entry might require a visit to the same heap page. - */ + Assert(vischeck != TMVC_Unchecked); - tuple_from_heap = true; + switch (vischeck) + { + case TMVC_Unchecked: + elog(ERROR, "Failed to check visibility for tuple"); + /* + * In case of compilers that don't undertand that elog(ERROR) + * doens't exit, and which have -Wimplicit-fallthrough: + */ + /* fallthrough */ + case TMVC_MaybeInvisible: + { + /* + * Rats, we have to visit the heap to check visibility. + */ + InstrCountTuples2(node, 1); + if (!index_fetch_heap(scandesc, node->ioss_TableSlot)) + continue; /* no visible tuple, try next index entry */ + + ExecClearTuple(node->ioss_TableSlot); + + /* + * Only MVCC snapshots are supported here, so there should be + * no need to keep following the HOT chain once a visible + * entry has been found. If we did want to allow that, we'd + * need to keep more state to remember not to call + * index_getnext_tid next time. + */ + if (scandesc->xs_heap_continue) + elog(ERROR, "non-MVCC snapshots are not supported in index-only scans"); + + /* + * Note: at this point we are holding a pin on the heap page, + * as recorded in scandesc->xs_cbuf. We could release that + * pin now, but it's not clear whether it's a win to do so. + * The next index entry might require a visit to the same heap + * page. + */ + + tuple_from_heap = true; + break; + } + case TMVC_AllVisible: + break; } /* diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index c2918c9c831..29c71762cf8 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -6386,44 +6386,62 @@ get_actual_variable_endpoint(Relation heapRel, while ((tid = index_getnext_tid(index_scan, indexscandir)) != NULL) { BlockNumber block = ItemPointerGetBlockNumber(tid); + TMVC_Result visres = index_scan->xs_visrecheck; - if (!VM_ALL_VISIBLE(heapRel, - block, - &vmbuffer)) + if (visres == TMVC_Unchecked) + visres = table_index_vischeck_tuple(heapRel, &vmbuffer, tid); + + Assert(visres != TMVC_Unchecked); + + switch (visres) { - /* Rats, we have to visit the heap to check visibility */ - if (!index_fetch_heap(index_scan, tableslot)) - { + case TMVC_Unchecked: + elog(ERROR, "Failed to check visibility for tuple"); /* - * No visible tuple for this index entry, so we need to - * advance to the next entry. Before doing so, count heap - * page fetches and give up if we've done too many. - * - * We don't charge a page fetch if this is the same heap page - * as the previous tuple. This is on the conservative side, - * since other recently-accessed pages are probably still in - * buffers too; but it's good enough for this heuristic. + * In case of compilers that don't undertand that elog(ERROR) + * doens't exit, and which have -Wimplicit-fallthrough: */ + /* fallthrough */ + case TMVC_MaybeInvisible: + { + /* Rats, we have to visit the heap to check visibility */ + if (!index_fetch_heap(index_scan, tableslot)) + { + /* + * No visible tuple for this index entry, so we need to + * advance to the next entry. Before doing so, count heap + * page fetches and give up if we've done too many. + * + * We don't charge a page fetch if this is the same heap + * page as the previous tuple. This is on the + * conservative side, since other recently-accessed pages + * are probably still in buffers too; but it's good enough + * for this heuristic. + */ #define VISITED_PAGES_LIMIT 100 - if (block != last_heap_block) - { - last_heap_block = block; - n_visited_heap_pages++; - if (n_visited_heap_pages > VISITED_PAGES_LIMIT) - break; - } + if (block != last_heap_block) + { + last_heap_block = block; + n_visited_heap_pages++; + if (n_visited_heap_pages > VISITED_PAGES_LIMIT) + break; + } - continue; /* no visible tuple, try next index entry */ - } + continue; /* no visible tuple, try next index entry */ + } - /* We don't actually need the heap tuple for anything */ - ExecClearTuple(tableslot); + /* We don't actually need the heap tuple for anything */ + ExecClearTuple(tableslot); - /* - * We don't care whether there's more than one visible tuple in - * the HOT chain; if any are visible, that's good enough. - */ + /* + * We don't care whether there's more than one visible tuple in + * the HOT chain; if any are visible, that's good enough. + */ + break; + } + case TMVC_AllVisible: + break; } /* -- 2.45.2
From ba72cc0cbb44dc6c4da64e49f2fd98b6f8a00528 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <boekewurm+postgres@gmail.com> Date: Sat, 8 Mar 2025 01:15:08 +0100 Subject: [PATCH v9 3/4] SP-GIST: Fix visibility issues in IOS Previously, SP-GIST IOS could buffer tuples from pages while VACUUM came along and cleaned up an ALL_DEAD tuple, marking the tuple's page ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed visible. With this patch, pins now conflict with SP-GIST vacuum, and we now do preliminary visibility checks to be used by IOS so that the IOS infrastructure knows to recheck the heap page even if that page is now ALL_VISIBLE. Note: For PG17 and below, this needs some adaptations to use e.g. VM_ALL_VISIBLE, and pack its fields in places that work fine on 32-bit systems, too. Idea from Heikki Linnakangas Backpatch: 17- --- src/include/access/spgist_private.h | 9 +- src/backend/access/spgist/spgscan.c | 175 ++++++++++++++++++++++++-- src/backend/access/spgist/spgvacuum.c | 2 +- 3 files changed, 172 insertions(+), 14 deletions(-) diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h index cb43a278f46..63e970468c7 100644 --- a/src/include/access/spgist_private.h +++ b/src/include/access/spgist_private.h @@ -21,6 +21,7 @@ #include "storage/buf.h" #include "utils/geo_decls.h" #include "utils/relcache.h" +#include "tableam.h" typedef struct SpGistOptions @@ -175,7 +176,7 @@ typedef struct SpGistSearchItem bool isLeaf; /* SearchItem is heap item */ bool recheck; /* qual recheck is needed */ bool recheckDistances; /* distance recheck is needed */ - + uint8 visrecheck; /* IOS: TMVC_Result of contained heap tuple */ /* array with numberOfOrderBys entries */ double distances[FLEXIBLE_ARRAY_MEMBER]; } SpGistSearchItem; @@ -223,6 +224,7 @@ typedef struct SpGistScanOpaqueData /* These fields are only used in amgettuple scans: */ bool want_itup; /* are we reconstructing tuples? */ + Buffer vmbuf; /* IOS: used for table_index_vischeck_tuples */ TupleDesc reconTupDesc; /* if so, descriptor for reconstructed tuples */ int nPtrs; /* number of TIDs found on current page */ int iPtr; /* index for scanning through same */ @@ -235,6 +237,11 @@ typedef struct SpGistScanOpaqueData /* distances (for recheck) */ IndexOrderByDistance *distances[MaxIndexTuplesPerPage]; + /* support for IOS */ + int nReorderThisPage; + uint8 *visrecheck; /* IOS vis check results, counted by nPtrs */ + SpGistSearchItem **items; /* counted by nReorderThisPage */ + /* * Note: using MaxIndexTuplesPerPage above is a bit hokey since * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger, diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c index 53f910e9d89..3a7c0c308e6 100644 --- a/src/backend/access/spgist/spgscan.c +++ b/src/backend/access/spgist/spgscan.c @@ -30,7 +30,8 @@ typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr, Datum leafValue, bool isNull, SpGistLeafTuple leafTuple, bool recheck, - bool recheckDistances, double *distances); + bool recheckDistances, double *distances, + TMVC_Result visrecheck); /* * Pairing heap comparison function for the SpGistSearchItem queue. @@ -142,6 +143,7 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull) startEntry->traversalValue = NULL; startEntry->recheck = false; startEntry->recheckDistances = false; + startEntry->visrecheck = TMVC_Unchecked; spgAddSearchItemToQueue(so, startEntry); } @@ -386,6 +388,19 @@ spgrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys, if (scankey && scan->numberOfKeys > 0) memcpy(scan->keyData, scankey, scan->numberOfKeys * sizeof(ScanKeyData)); + /* prepare index-only scan requirements */ + so->nReorderThisPage = 0; + if (scan->xs_want_itup) + { + if (so->visrecheck == NULL) + so->visrecheck = palloc(MaxIndexTuplesPerPage); + + if (scan->numberOfOrderBys > 0 && so->items == NULL) + { + so->items = palloc_array(SpGistSearchItem *, MaxIndexTuplesPerPage); + } + } + /* initialize order-by data if needed */ if (orderbys && scan->numberOfOrderBys > 0) { @@ -451,6 +466,9 @@ spgendscan(IndexScanDesc scan) pfree(scan->xs_orderbynulls); } + if (BufferIsValid(so->vmbuf)) + ReleaseBuffer(so->vmbuf); + pfree(so); } @@ -500,6 +518,7 @@ spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple, item->isLeaf = true; item->recheck = recheck; item->recheckDistances = recheckDistances; + item->visrecheck = TMVC_Unchecked; return item; } @@ -582,6 +601,14 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item, isnull, distances); + if (so->want_itup) + { + Assert(PointerIsValid(so->items)); + + so->items[so->nReorderThisPage] = heapItem; + so->nReorderThisPage++; + } + spgAddSearchItemToQueue(so, heapItem); MemoryContextSwitchTo(oldCxt); @@ -591,7 +618,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item, /* non-ordered scan, so report the item right away */ Assert(!recheckDistances); storeRes(so, &leafTuple->heapPtr, leafValue, isnull, - leafTuple, recheck, false, NULL); + leafTuple, recheck, false, NULL, TMVC_Unchecked); *reportedSome = true; } } @@ -804,6 +831,84 @@ spgTestLeafTuple(SpGistScanOpaque so, return SGLT_GET_NEXTOFFSET(leafTuple); } +/* pupulate so->visrecheck based on current cached tuples */ +static void +spgPopulateUnorderedVischecks(IndexScanDesc scan, SpGistScanOpaqueData *so) +{ + TM_IndexVisibilityCheckOp op; + Assert(so->nPtrs > 0); + Assert(scan->numberOfOrderBys == 0); + + op.nchecktids = so->nPtrs; + op.checktids = palloc_array(TM_VisCheck, so->nPtrs); + op.vmbuf = &so->vmbuf; + + for (int i = 0; i < op.nchecktids; i++) + { + op.checktids[i].idxoffnum = i; + op.checktids[i].vischeckresult = TMVC_Unchecked; + op.checktids[i].tid = so->heapPtrs[i]; + + Assert(ItemPointerIsValid(&op.checktids[i].tid)); + } + + table_index_vischeck_tuples(scan->heapRelation, &op); + + for (int i = 0; i < op.nchecktids; i++) + { + TM_VisCheck *check = &op.checktids[i]; + + Assert(ItemPointerEquals(&so->heapPtrs[check->idxoffnum], + &check->tid)); + Assert(check->idxoffnum < op.nchecktids); + + so->visrecheck[check->idxoffnum] = check->vischeckresult; + } + + pfree(op.checktids); +} + +/* pupulate so->visrecheck based on current cached tuples */ +static void +spgPopulateOrderedVisChecks(IndexScanDesc scan, SpGistScanOpaqueData *so) +{ + TM_IndexVisibilityCheckOp op; + + Assert(so->nReorderThisPage > 0); + Assert(scan->numberOfOrderBys > 0); + Assert(PointerIsValid(so->items)); + + op.nchecktids = so->nReorderThisPage; + op.checktids = palloc_array(TM_VisCheck, so->nReorderThisPage); + op.vmbuf = &so->vmbuf; + + for (int i = 0; i < op.nchecktids; i++) + { + op.checktids[i].idxoffnum = i; + op.checktids[i].vischeckresult = TMVC_Unchecked; + op.checktids[i].tid = so->items[i]->heapPtr; + + Assert(ItemPointerIsValid(&so->items[i]->heapPtr)); + Assert(so->items[i]->isLeaf); + } + + table_index_vischeck_tuples(scan->heapRelation, &op); + + for (int i = 0; i < op.nchecktids; i++) + { + TM_VisCheck *check = &op.checktids[i]; + + Assert(check->idxoffnum < op.nchecktids); + Assert(ItemPointerEquals(&check->tid, + &so->items[check->idxoffnum]->heapPtr)); + + so->items[check->idxoffnum]->visrecheck = check->vischeckresult; + } + + pfree(op.checktids); + so->nReorderThisPage = 0; +} + /* * Walk the tree and report all tuples passing the scan quals to the storeRes * subroutine. @@ -812,8 +917,8 @@ spgTestLeafTuple(SpGistScanOpaque so, * next page boundary once we have reported at least one tuple. */ static void -spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex, - storeRes_func storeRes) +spgWalk(IndexScanDesc scan, Relation index, SpGistScanOpaque so, + bool scanWholeIndex, storeRes_func storeRes) { Buffer buffer = InvalidBuffer; bool reportedSome = false; @@ -833,9 +938,22 @@ redirect: { /* We store heap items in the queue only in case of ordered search */ Assert(so->numberOfNonNullOrderBys > 0); + /* + * If an item we found on a page is retrieved immediately after + * processing that page, we won't yet have released the page pin, + * and thus won't yet have processed the visibility data of the + * page's (now) ordered tuples. + * Do that now, so that the tuple we're about to store does have + * accurate data. + */ + if (so->want_itup && so->nReorderThisPage) + spgPopulateOrderedVisChecks(scan, so); + + Assert(!so->want_itup || item->visrecheck != TMVC_Unchecked); storeRes(so, &item->heapPtr, item->value, item->isNull, item->leafTuple, item->recheck, - item->recheckDistances, item->distances); + item->recheckDistances, item->distances, + item->visrecheck); reportedSome = true; } else @@ -852,7 +970,12 @@ redirect: } else if (blkno != BufferGetBlockNumber(buffer)) { - UnlockReleaseBuffer(buffer); + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + + if (so->nReorderThisPage > 0) + spgPopulateOrderedVisChecks(scan, so); + + ReleaseBuffer(buffer); buffer = ReadBuffer(index, blkno); LockBuffer(buffer, BUFFER_LOCK_SHARE); } @@ -920,16 +1043,37 @@ redirect: } if (buffer != InvalidBuffer) - UnlockReleaseBuffer(buffer); -} + { + /* + * If we're in an index-only scan, pre-check visibility of the tuples, + * so we can drop the pin quickly. + */ + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + + if (so->want_itup) + { + if (scan->numberOfOrderBys > 0 && so->nReorderThisPage > 0) + { + spgPopulateOrderedVisChecks(scan, so); + } + if (scan->numberOfOrderBys == 0 && so->nPtrs > 0) + { + spgPopulateUnorderedVischecks(scan, so); + } + } + + ReleaseBuffer(buffer); + } +} /* storeRes subroutine for getbitmap case */ static void storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr, Datum leafValue, bool isnull, SpGistLeafTuple leafTuple, bool recheck, - bool recheckDistances, double *distances) + bool recheckDistances, double *distances, + TMVC_Result visres) { Assert(!recheckDistances && !distances); tbm_add_tuples(so->tbm, heapPtr, 1, recheck); @@ -947,7 +1091,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm) so->tbm = tbm; so->ntids = 0; - spgWalk(scan->indexRelation, so, true, storeBitmap); + spgWalk(scan, scan->indexRelation, so, true, storeBitmap); return so->ntids; } @@ -957,12 +1101,15 @@ static void storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr, Datum leafValue, bool isnull, SpGistLeafTuple leafTuple, bool recheck, - bool recheckDistances, double *nonNullDistances) + bool recheckDistances, double *nonNullDistances, + TMVC_Result visres) { Assert(so->nPtrs < MaxIndexTuplesPerPage); so->heapPtrs[so->nPtrs] = *heapPtr; so->recheck[so->nPtrs] = recheck; so->recheckDistances[so->nPtrs] = recheckDistances; + if (so->want_itup) + so->visrecheck[so->nPtrs] = visres; if (so->numberOfOrderBys > 0) { @@ -1039,6 +1186,10 @@ spggettuple(IndexScanDesc scan, ScanDirection dir) scan->xs_heaptid = so->heapPtrs[so->iPtr]; scan->xs_recheck = so->recheck[so->iPtr]; scan->xs_hitup = so->reconTups[so->iPtr]; + if (so->want_itup) + scan->xs_visrecheck = so->visrecheck[so->iPtr]; + + Assert(!scan->xs_want_itup || scan->xs_visrecheck != TMVC_Unchecked); if (so->numberOfOrderBys > 0) index_store_float8_orderby_distances(scan, so->orderByTypes, @@ -1068,7 +1219,7 @@ spggettuple(IndexScanDesc scan, ScanDirection dir) } so->iPtr = so->nPtrs = 0; - spgWalk(scan->indexRelation, so, false, storeGettuple); + spgWalk(scan, scan->indexRelation, so, false, storeGettuple); if (so->nPtrs == 0) break; /* must have completed scan */ diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c index eeddacd0d52..993d4a5b662 100644 --- a/src/backend/access/spgist/spgvacuum.c +++ b/src/backend/access/spgist/spgvacuum.c @@ -629,7 +629,7 @@ spgvacuumpage(spgBulkDeleteState *bds, BlockNumber blkno) buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno, RBM_NORMAL, bds->info->strategy); - LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + LockBufferForCleanup(buffer); page = (Page) BufferGetPage(buffer); if (PageIsNew(page)) -- 2.45.2
From 6eb8314fb3e07595e4016ade40fe8dade7fb9c37 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <boekewurm+postgres@gmail.com> Date: Fri, 7 Mar 2025 22:55:24 +0100 Subject: [PATCH v9 2/4] GIST: Fix visibility issues in IOS Previously, GIST IOS could buffer tuples from pages while VACUUM came along and cleaned up an ALL_DEAD tuple, marking the tuple's page ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed visible. With this patch, pins now conflict with GIST vacuum, and we now do preliminary visibility checks to be used by IOS so that the IOS infrastructure knows to recheck the heap page even if that page is now ALL_VISIBLE. Note: For PG17 and below, this needs some adaptations to use e.g. VM_ALL_VISIBLE, and pack its fields in places that work fine on 32-bit systems, too. Idea from Heikki Linnakangas Backpatch: 17- --- src/include/access/gist_private.h | 27 +++++-- src/backend/access/gist/gistget.c | 104 ++++++++++++++++++++++++++- src/backend/access/gist/gistscan.c | 5 ++ src/backend/access/gist/gistvacuum.c | 6 +- 4 files changed, 131 insertions(+), 11 deletions(-) diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h index 39404ec7cdb..4261565b5ad 100644 --- a/src/include/access/gist_private.h +++ b/src/include/access/gist_private.h @@ -22,6 +22,7 @@ #include "storage/buffile.h" #include "utils/hsearch.h" #include "access/genam.h" +#include "tableam.h" /* * Maximum number of "halves" a page can be split into in one operation. @@ -124,6 +125,8 @@ typedef struct GISTSearchHeapItem * index-only scans */ OffsetNumber offnum; /* track offset in page to mark tuple as * LP_DEAD */ + uint8 visrecheck; /* Cached visibility check result for this + * heap pointer. */ } GISTSearchHeapItem; /* Unvisited item, either index page or heap tuple */ @@ -170,12 +173,24 @@ typedef struct GISTScanOpaqueData BlockNumber curBlkno; /* current number of block */ GistNSN curPageLSN; /* pos in the WAL stream when page was read */ - /* In a non-ordered search, returnable heap items are stored here: */ - GISTSearchHeapItem pageData[BLCKSZ / sizeof(IndexTupleData)]; - OffsetNumber nPageData; /* number of valid items in array */ - OffsetNumber curPageData; /* next item to return */ - MemoryContext pageDataCxt; /* context holding the fetched tuples, for - * index-only scans */ + /* info used by Index-Only Scans */ + Buffer vmbuf; /* reusable buffer for IOS' vm lookups */ + + union { + struct { + /* In a non-ordered search, returnable heap items are stored here: */ + GISTSearchHeapItem pageData[BLCKSZ / sizeof(IndexTupleData)]; + OffsetNumber nPageData; /* number of valid items in array */ + OffsetNumber curPageData; /* next item to return */ + MemoryContext pageDataCxt; /* context holding the fetched tuples, + * for index-only scans */ + }; + struct { + /* In an ordered search, we use this as scratch space */ + GISTSearchHeapItem *sortData[BLCKSZ / sizeof(IndexTupleData)]; + OffsetNumber nsortData; /* number of items in sortData */ + }; + }; } GISTScanOpaqueData; typedef GISTScanOpaqueData *GISTScanOpaque; diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index cc40e928e0a..bb4caa6c310 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -24,6 +24,7 @@ #include "utils/float.h" #include "utils/memutils.h" #include "utils/rel.h" +#include "access/tableam.h" /* * gistkillitems() -- set LP_DEAD state for items an indexscan caller has @@ -394,7 +395,15 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, return; } - so->nPageData = so->curPageData = 0; + if (scan->numberOfOrderBys) + { + so->nsortData = 0; + } + else + { + so->nPageData = so->curPageData = 0; + } + scan->xs_hitup = NULL; /* might point into pageDataCxt */ if (so->pageDataCxt) MemoryContextReset(so->pageDataCxt); @@ -501,7 +510,11 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, * In an index-only scan, also fetch the data from the tuple. */ if (scan->xs_want_itup) + { item->data.heap.recontup = gistFetchTuple(giststate, r, it); + so->sortData[so->nsortData] = &item->data.heap; + so->nsortData += 1; + } } else { @@ -526,7 +539,88 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, } } - UnlockReleaseBuffer(buffer); + /* Allow writes to the buffer */ + LockBuffer(buffer, BUFFER_LOCK_UNLOCK); + + if (scan->xs_want_itup) + { + TM_IndexVisibilityCheckOp op; + op.vmbuf = &so->vmbuf; + + if (scan->numberOfOrderBys > 0) + { + op.nchecktids = so->nsortData; + + if (op.nchecktids > 0) + { + op.checktids = palloc(op.nchecktids * sizeof(TM_VisCheck)); + + for (int off = 0; off < op.nchecktids; off++) + { + op.checktids[off].vischeckresult = TMVC_Unchecked; + op.checktids[off].tid = so->sortData[off]->heapPtr; + op.checktids[off].idxoffnum = off; + Assert(ItemPointerIsValid(&op.checktids[off].tid)); + } + } + } + else + { + op.nchecktids = so->nPageData; + + if (op.nchecktids > 0) + { + op.checktids = palloc_array(TM_VisCheck, op.nchecktids); + + for (int off = 0; off < op.nchecktids; off++) + { + op.checktids[off].vischeckresult = TMVC_Unchecked; + op.checktids[off].tid = so->pageData[off].heapPtr; + op.checktids[off].idxoffnum = off; + Assert(ItemPointerIsValid(&op.checktids[off].tid)); + } + } + } + + if (op.nchecktids > 0) + { + table_index_vischeck_tuples(scan->heapRelation, &op); + + if (scan->numberOfOrderBys > 0) + { + for (int off = 0; off < op.nchecktids; off++) + { + TM_VisCheck *check = &op.checktids[off]; + GISTSearchHeapItem *item = so->sortData[check->idxoffnum]; + + Assert(check->idxoffnum < op.nchecktids); + Assert(ItemPointerEquals(&item->heapPtr, &check->tid)); + + item->visrecheck = check->vischeckresult; + } + /* reset state */ + so->nsortData = 0; + } + else + { + for (int off = 0; off < op.nchecktids; off++) + { + TM_VisCheck *check = &op.checktids[off]; + GISTSearchHeapItem *item = &so->pageData[check->idxoffnum]; + + Assert(check->idxoffnum < op.nchecktids); + Assert(ItemPointerEquals(&item->heapPtr, &check->tid)); + + item->visrecheck = check->vischeckresult; + } + } + + pfree(op.checktids); + } + } + + /* Allow VACUUM to process the buffer again */ + ReleaseBuffer(buffer); } /* @@ -588,7 +682,10 @@ getNextNearest(IndexScanDesc scan) /* in an index-only scan, also return the reconstructed tuple. */ if (scan->xs_want_itup) + { scan->xs_hitup = item->data.heap.recontup; + scan->xs_visrecheck = item->data.heap.visrecheck; + } res = true; } else @@ -673,7 +770,10 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir) /* in an index-only scan, also return the reconstructed tuple */ if (scan->xs_want_itup) + { scan->xs_hitup = so->pageData[so->curPageData].recontup; + scan->xs_visrecheck = so->pageData[so->curPageData].visrecheck; + } so->curPageData++; diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c index 700fa959d03..52f5f144ccd 100644 --- a/src/backend/access/gist/gistscan.c +++ b/src/backend/access/gist/gistscan.c @@ -347,6 +347,11 @@ void gistendscan(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; + if (BufferIsValid(so->vmbuf)) + { + ReleaseBuffer(so->vmbuf); + so->vmbuf = InvalidBuffer; + } /* * freeGISTstate is enough to clean up everything made by gistbeginscan, diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c index dd0d9d5006c..5a95b93236e 100644 --- a/src/backend/access/gist/gistvacuum.c +++ b/src/backend/access/gist/gistvacuum.c @@ -289,10 +289,10 @@ restart: info->strategy); /* - * We are not going to stay here for a long time, aggressively grab an - * exclusive lock. + * We are not going to stay here for a long time, aggressively grab a + * cleanup lock. */ - LockBuffer(buffer, GIST_EXCLUSIVE); + LockBufferForCleanup(buffer); page = (Page) BufferGetPage(buffer); if (gistPageRecyclable(page)) -- 2.45.2
From e6c6d48556c57ba9c89a53e41872957859a3ead9 Mon Sep 17 00:00:00 2001 From: nkey <michail.nikolaev@gmail.com> Date: Fri, 7 Feb 2025 21:38:51 +0100 Subject: [PATCH v9 4/4] Tests for index-only scan race condition with concurrent VACUUM in GiST/SP-GiST Add regression tests that demonstrate wrong results can occur with index-only scans in GiST and SP-GiST indexes when encountering tuples removed by a concurrent VACUUM operation. The issue occurs because these index types don't acquire the proper cleanup lock on index buffers during VACUUM, unlike btree indexes. Author: Peter Geoghegan <pg@bowt.ie>, Matthias van de Meent <boekewurm+postgres@gmail.com>, Michail Nikolaev <michail.nikolaev@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CANtu0oi0rkR%2BFsgyLXnGZ-uW2950-urApAWLhy-%2BV1WJD%3D_ZXA%40mail.gmail.com --- .../expected/index-only-scan-gist-vacuum.out | 67 +++++++++++ .../index-only-scan-spgist-vacuum.out | 67 +++++++++++ src/test/isolation/isolation_schedule | 2 + .../specs/index-only-scan-gist-vacuum.spec | 113 ++++++++++++++++++ .../specs/index-only-scan-spgist-vacuum.spec | 113 ++++++++++++++++++ 5 files changed, 362 insertions(+) create mode 100644 src/test/isolation/expected/index-only-scan-gist-vacuum.out create mode 100644 src/test/isolation/expected/index-only-scan-spgist-vacuum.out create mode 100644 src/test/isolation/specs/index-only-scan-gist-vacuum.spec create mode 100644 src/test/isolation/specs/index-only-scan-spgist-vacuum.spec diff --git a/src/test/isolation/expected/index-only-scan-gist-vacuum.out b/src/test/isolation/expected/index-only-scan-gist-vacuum.out new file mode 100644 index 00000000000..19117402f52 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-gist-vacuum.out @@ -0,0 +1,67 @@ +Parsed test spec with 2 sessions + +starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit +step s2_mod: + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; + +step s1_begin: BEGIN; +step s1_prepare_sorted: + DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)'; + +step s1_fetch_1: + FETCH FROM foo; + + x +------------------ +1.4142135623730951 +(1 row) + +step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock; <waiting ...> +step s1_fetch_all: + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; + +pg_sleep_for +------------ + +(1 row) + +x +- +(0 rows) + +step s2_vacuum: <... completed> +step s1_commit: COMMIT; + +starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit +step s2_mod: + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; + +step s1_begin: BEGIN; +step s1_prepare_unsorted: + DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a; + +step s1_fetch_1: + FETCH FROM foo; + +a +----- +(1,1) +(1 row) + +step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock; <waiting ...> +step s1_fetch_all: + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; + +pg_sleep_for +------------ + +(1 row) + +a +- +(0 rows) + +step s2_vacuum: <... completed> +step s1_commit: COMMIT; diff --git a/src/test/isolation/expected/index-only-scan-spgist-vacuum.out b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out new file mode 100644 index 00000000000..19117402f52 --- /dev/null +++ b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out @@ -0,0 +1,67 @@ +Parsed test spec with 2 sessions + +starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit +step s2_mod: + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; + +step s1_begin: BEGIN; +step s1_prepare_sorted: + DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)'; + +step s1_fetch_1: + FETCH FROM foo; + + x +------------------ +1.4142135623730951 +(1 row) + +step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock; <waiting ...> +step s1_fetch_all: + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; + +pg_sleep_for +------------ + +(1 row) + +x +- +(0 rows) + +step s2_vacuum: <... completed> +step s1_commit: COMMIT; + +starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit +step s2_mod: + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; + +step s1_begin: BEGIN; +step s1_prepare_unsorted: + DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a; + +step s1_fetch_1: + FETCH FROM foo; + +a +----- +(1,1) +(1 row) + +step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock; <waiting ...> +step s1_fetch_all: + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; + +pg_sleep_for +------------ + +(1 row) + +a +- +(0 rows) + +step s2_vacuum: <... completed> +step s1_commit: COMMIT; diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 143109aa4da..9720c9a2dc8 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -17,6 +17,8 @@ test: partial-index test: two-ids test: multiple-row-versions test: index-only-scan +test: index-only-scan-gist-vacuum +test: index-only-scan-spgist-vacuum test: predicate-lock-hot-tuple test: update-conflict-out test: deadlock-simple diff --git a/src/test/isolation/specs/index-only-scan-gist-vacuum.spec b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec new file mode 100644 index 00000000000..b1688d44fa7 --- /dev/null +++ b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec @@ -0,0 +1,113 @@ +# index-only-scan test showing wrong results with GiST +# +setup +{ + -- by using a low fillfactor and a wide tuple we can get multiple blocks + -- with just few rows + CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '') + WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10); + + INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i); + + CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING gist(a); +} +setup +{ + VACUUM ios_needs_cleanup_lock; +} + +teardown +{ + DROP TABLE ios_needs_cleanup_lock; +} + + +session s1 + +# Force an index-only scan, where possible: +setup { + SET enable_bitmapscan = false; + SET enable_indexonlyscan = true; + SET enable_indexscan = true; +} + +step s1_begin { BEGIN; } +step s1_commit { COMMIT; } + +step s1_prepare_sorted { + DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)'; +} + +step s1_prepare_unsorted { + DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a; +} + +step s1_fetch_1 { + FETCH FROM foo; +} + +step s1_fetch_all { + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; +} + + +session s2 + +# Don't delete row 1 so we have a row for the cursor to "rest" on. +step s2_mod +{ + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; +} + +# Disable truncation, as otherwise we'll just wait for a timeout while trying +# to acquire the lock +step s2_vacuum { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; } + +permutation + # delete nearly all rows, to make issue visible + s2_mod + # create a cursor + s1_begin + s1_prepare_sorted + + # fetch one row from the cursor, that ensures the index scan portion is done + # before the vacuum in the next step + s1_fetch_1 + + # with the bug this vacuum will mark pages as all-visible that the scan in + # the next step then considers all-visible, despite all rows from those + # pages having been removed. + # Because this should block on buffer-level locks, this won't ever be + # considered "blocked" by isolation tester, and so we only have a single + # step we can work with concurrently. + s2_vacuum (*) + + # if this returns any rows, we're busted + s1_fetch_all + + s1_commit + +permutation + # delete nearly all rows, to make issue visible + s2_mod + # create a cursor + s1_begin + s1_prepare_unsorted + + # fetch one row from the cursor, that ensures the index scan portion is done + # before the vacuum in the next step + s1_fetch_1 + + # with the bug this vacuum will mark pages as all-visible that the scan in + # the next step then considers all-visible, despite all rows from those + # pages having been removed. + # Because this should block on buffer-level locks, this won't ever be + # considered "blocked" by isolation tester, and so we only have a single + # step we can work with concurrently. + s2_vacuum (*) + + # if this returns any rows, we're busted + s1_fetch_all + + s1_commit diff --git a/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec new file mode 100644 index 00000000000..b414c5d1695 --- /dev/null +++ b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec @@ -0,0 +1,113 @@ +# index-only-scan test showing wrong results with SPGiST +# +setup +{ + -- by using a low fillfactor and a wide tuple we can get multiple blocks + -- with just few rows + CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '') + WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10); + + INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i); + + CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING spgist(a); +} +setup +{ + VACUUM ios_needs_cleanup_lock; +} + +teardown +{ + DROP TABLE ios_needs_cleanup_lock; +} + + +session s1 + +# Force an index-only scan, where possible: +setup { + SET enable_bitmapscan = false; + SET enable_indexonlyscan = true; + SET enable_indexscan = true; +} + +step s1_begin { BEGIN; } +step s1_commit { COMMIT; } + +step s1_prepare_sorted { + DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)'; +} + +step s1_prepare_unsorted { + DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a; +} + +step s1_fetch_1 { + FETCH FROM foo; +} + +step s1_fetch_all { + SELECT pg_sleep_for(INTERVAL '50ms'); + FETCH ALL FROM foo; +} + + +session s2 + +# Don't delete row 1 so we have a row for the cursor to "rest" on. +step s2_mod +{ + DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)'; +} + +# Disable truncation, as otherwise we'll just wait for a timeout while trying +# to acquire the lock +step s2_vacuum { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; } + +permutation + # delete nearly all rows, to make issue visible + s2_mod + # create a cursor + s1_begin + s1_prepare_sorted + + # fetch one row from the cursor, that ensures the index scan portion is done + # before the vacuum in the next step + s1_fetch_1 + + # with the bug this vacuum will mark pages as all-visible that the scan in + # the next step then considers all-visible, despite all rows from those + # pages having been removed. + # Because this should block on buffer-level locks, this won't ever be + # considered "blocked" by isolation tester, and so we only have a single + # step we can work with concurrently. + s2_vacuum (*) + + # if this returns any rows, we're busted + s1_fetch_all + + s1_commit + +permutation + # delete nearly all rows, to make issue visible + s2_mod + # create a cursor + s1_begin + s1_prepare_unsorted + + # fetch one row from the cursor, that ensures the index scan portion is done + # before the vacuum in the next step + s1_fetch_1 + + # with the bug this vacuum will mark pages as all-visible that the scan in + # the next step then considers all-visible, despite all rows from those + # pages having been removed. + # Because this should block on buffer-level locks, this won't ever be + # considered "blocked" by isolation tester, and so we only have a single + # step we can work with concurrently. + s2_vacuum (*) + + # if this returns any rows, we're busted + s1_fetch_all + + s1_commit -- 2.45.2