I had already written the patch for immediate reaping addressing the below feedback before I saw the emails that said everyone is happy with using hastup in lazy_scan_[no]prune() in a preliminary patch. Let me know if you have a strong preference for reordering. Otherwise, I will write the three subsequent patches on top of this one.
On Tue, Jan 9, 2024 at 2:00 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, Jan 9, 2024 at 11:35 AM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > The easiest solution would be to change the name of the parameter to > > heap_page_prune_execute()'s from "no_indexes" to something like > > "validate_unused", since it is only used in assert builds for > > validation. > > Right. > > > However, though I wish a name change was the right way to solve this > > problem, my gut feeling is that it is not. It seems like we should > > rely only on the WAL record itself in recovery. Right now the > > parameter is used exclusively for validation, so it isn't so bad. But > > what if someone uses this parameter in the future in heap_xlog_prune() > > to decide how to modify the page? > > Exactly. > > > It seems like the right solution would be to add a flag into the prune > > record indicating what to pass to heap_page_prune_execute(). In the > > future, I'd like to add flags for updating the VM to each of the prune > > and vacuum records (eliminating the separate VM update record). Thus, > > a new flags member of the prune record could have future use. However, > > this would add a uint8 to the record. I can go and look for some > > padding if you think this is the right direction? > > I thought about this approach and it might be OK but I don't love it, > because it means making the WAL record bigger on production systems > for the sake of assertion that only fires for developers. Sure, it's > possible that there might be another use in the future, but there > might also not be another use in the future. > > How about changing if (no_indexes) to if (ndead == 0) and adding a > comment like this: /* If there are any tuples being marked LP_DEAD, > then the relation must have indexes, so every item being marked unused > must be a heap-only tuple. But if there are no tuples being marked > LP_DEAD, then it's possible that the relation has no indexes, in which > case all we know is that the line pointer shouldn't already be > LP_UNUSED. */ Ah, I like this a lot. Attached patch does this. I've added a modified version of the comment you suggested. My only question is if we are losing something without this sentence (from the old comment): - * ... They don't need to be left in place as LP_DEAD items until VACUUM gets - * around to doing index vacuuming. I don't feel like it adds a lot, but it is absent from the new comment, so thought I would check. > BTW: > > + * LP_REDIRECT, or LP_DEAD items to LP_UNUSED > during pruning. We > + * can't check much here except that, if the > item is LP_NORMAL, it > + * should have storage before it is set LP_UNUSED. > > Is it really helpful to check this here, or just confusing/grasping at > straws? I mean, the requirement that LP_NORMAL items have storage is a > general one, IIUC, not something that's specific to this situation. It > feels like the equivalent of checking that your children don't set > fire to the couch on Tuesdays. Hmm. Yes. I suppose I was trying to find something to validate. Is it worth checking that the line pointer is not already LP_UNUSED? Or is that a bit ridiculous? - Melanie
From 2d018672abe9107196a4ff30254bd63a71e5ddb5 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 9 Jan 2024 15:00:24 -0500 Subject: [PATCH v5 1/4] Set would-be dead items LP_UNUSED while pruning If there are no indexes on a relation, items can be marked LP_UNUSED instead of LP_DEAD during while pruning. This avoids a separate invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record. To accomplish this, pass heap_page_prune() a new parameter, no_indexes, which indicates that dead line pointers should be set to LP_UNUSED during pruning, allowing earlier reaping of tuples. Because we don't update the freespace map until after dropping the lock on the buffer and we need the lock while we update the visibility map, save our intent to update the freespace map in output parameter recordfreespace. This is not added to the LVPagePruneState because future commits will eliminate the LVPagePruneState. Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com --- src/backend/access/heap/pruneheap.c | 61 +++++++++-- src/backend/access/heap/vacuumlazy.c | 150 ++++++++++----------------- src/include/access/heapam.h | 1 + 3 files changed, 107 insertions(+), 105 deletions(-) diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 3e0a1a260e6..9e5ddc80f4a 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -35,6 +35,8 @@ typedef struct /* tuple visibility test, initialized for the relation */ GlobalVisState *vistest; + /* whether or not dead items can be set LP_UNUSED during pruning */ + bool no_indexes; TransactionId new_prune_xid; /* new prune hint value for page */ TransactionId snapshotConflictHorizon; /* latest xid removed */ @@ -148,7 +150,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer) { PruneResult presult; - heap_page_prune(relation, buffer, vistest, &presult, NULL); + /* + * For now, pass no_indexes == false regardless of whether or not + * the relation has indexes. In the future we may enable immediate + * reaping for on access pruning. + */ + heap_page_prune(relation, buffer, vistest, false, + &presult, NULL); /* * Report the number of tuples reclaimed to pgstats. This is @@ -193,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * (see heap_prune_satisfies_vacuum and * HeapTupleSatisfiesVacuum). * + * no_indexes indicates whether or not dead items can be set LP_UNUSED during + * pruning. + * * off_loc is the offset location required by the caller to use in error * callback. * @@ -203,6 +214,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer) void heap_page_prune(Relation relation, Buffer buffer, GlobalVisState *vistest, + bool no_indexes, PruneResult *presult, OffsetNumber *off_loc) { @@ -227,6 +239,7 @@ heap_page_prune(Relation relation, Buffer buffer, prstate.new_prune_xid = InvalidTransactionId; prstate.rel = relation; prstate.vistest = vistest; + prstate.no_indexes = no_indexes; prstate.snapshotConflictHorizon = InvalidTransactionId; prstate.nredirected = prstate.ndead = prstate.nunused = 0; memset(prstate.marked, 0, sizeof(prstate.marked)); @@ -306,9 +319,9 @@ heap_page_prune(Relation relation, Buffer buffer, if (off_loc) *off_loc = offnum; - /* Nothing to do if slot is empty or already dead */ + /* Nothing to do if slot is empty */ itemid = PageGetItemId(page, offnum); - if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid)) + if (!ItemIdIsUsed(itemid)) continue; /* Process this item or chain of items */ @@ -581,7 +594,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * function.) */ if (ItemIdIsDead(lp)) + { + /* + * If the relation has no indexes, we can set dead line pointers + * LP_UNUSED now. We don't increment ndeleted here since the LP + * was already marked dead. + */ + if (unlikely(prstate->no_indexes)) + heap_prune_record_unused(prstate, offnum); + break; + } Assert(ItemIdIsNormal(lp)); htup = (HeapTupleHeader) PageGetItem(dp, lp); @@ -726,7 +749,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * item. This can happen if the loop in heap_page_prune caused us to * visit the dead successor of a redirect item before visiting the * redirect item. We can clean up by setting the redirect item to - * DEAD state. + * DEAD state or LP_UNUSED if the table has no indexes. */ heap_prune_record_dead(prstate, rootoffnum); } @@ -767,6 +790,17 @@ heap_prune_record_redirect(PruneState *prstate, static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum) { + /* + * If the relation has no indexes, we can remove dead tuples during + * pruning instead of marking their line pointers dead. Set this tuple's + * line pointer LP_UNUSED. We hint that tables with indexes are more + * likely. + */ + if (unlikely(prstate->no_indexes)) + { + heap_prune_record_unused(prstate, offnum); + return; + } Assert(prstate->ndead < MaxHeapTuplesPerPage); prstate->nowdead[prstate->ndead] = offnum; prstate->ndead++; @@ -903,13 +937,20 @@ heap_page_prune_execute(Buffer buffer, #ifdef USE_ASSERT_CHECKING /* - * Only heap-only tuples can become LP_UNUSED during pruning. They - * don't need to be left in place as LP_DEAD items until VACUUM gets - * around to doing index vacuuming. + * If there are any items being marked LP_DEAD, then the relation must + * have indexes, so every item being marked LP_UNUSED must refer to a + * heap-only tuple. But if there are no items being marked LP_DEAD, + * then it's possible that the relation has no indexes, in which case + * all we know is that the line pointer shouldn't already be + * LP_UNUSED. */ - Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp)); - htup = (HeapTupleHeader) PageGetItem(page, lp); - Assert(HeapTupleHeaderIsHeapOnly(htup)); + if (ndead > 0) + { + Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp)); + htup = (HeapTupleHeader) PageGetItem(page, lp); + Assert(HeapTupleHeaderIsHeapOnly(htup)); + } + #endif ItemIdSetUnused(lp); diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index abbba8947fa..e9636b2c47e 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -250,7 +250,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, bool sharelock, Buffer vmbuffer); static void lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, - LVPagePruneState *prunestate); + LVPagePruneState *prunestate, + bool *recordfreespace); static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, bool *hastup, bool *recordfreespace); @@ -830,8 +831,10 @@ lazy_scan_heap(LVRelState *vacrel) next_fsm_block_to_vacuum = 0; VacDeadItems *dead_items = vacrel->dead_items; Buffer vmbuffer = InvalidBuffer; + int tuples_already_deleted; bool next_unskippable_allvis, skipping_current_range; + bool recordfreespace; const int initprog_index[] = { PROGRESS_VACUUM_PHASE, PROGRESS_VACUUM_TOTAL_HEAP_BLKS, @@ -959,8 +962,7 @@ lazy_scan_heap(LVRelState *vacrel) page = BufferGetPage(buf); if (!ConditionalLockBufferForCleanup(buf)) { - bool hastup, - recordfreespace; + bool hastup; LockBuffer(buf, BUFFER_LOCK_SHARE); @@ -1010,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel) continue; } + tuples_already_deleted = vacrel->tuples_deleted; + /* * Prune, freeze, and count tuples. * @@ -1019,7 +1023,7 @@ lazy_scan_heap(LVRelState *vacrel) * were pruned some time earlier. Also considers freezing XIDs in the * tuple headers of remaining items with storage. */ - lazy_scan_prune(vacrel, buf, blkno, page, &prunestate); + lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace); Assert(!prunestate.all_visible || !prunestate.has_lpdead_items); @@ -1027,69 +1031,6 @@ lazy_scan_heap(LVRelState *vacrel) if (prunestate.hastup) vacrel->nonempty_pages = blkno + 1; - if (vacrel->nindexes == 0) - { - /* - * Consider the need to do page-at-a-time heap vacuuming when - * using the one-pass strategy now. - * - * The one-pass strategy will never call lazy_vacuum(). The steps - * performed here can be thought of as the one-pass equivalent of - * a call to lazy_vacuum(). - */ - if (prunestate.has_lpdead_items) - { - Size freespace; - - lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer); - - /* Forget the LP_DEAD items that we just vacuumed */ - dead_items->num_items = 0; - - /* - * Now perform FSM processing for blkno, and move on to next - * page. - * - * Our call to lazy_vacuum_heap_page() will have considered if - * it's possible to set all_visible/all_frozen independently - * of lazy_scan_prune(). Note that prunestate was invalidated - * by lazy_vacuum_heap_page() call. - */ - freespace = PageGetHeapFreeSpace(page); - - UnlockReleaseBuffer(buf); - RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); - - /* - * Periodically perform FSM vacuuming to make newly-freed - * space visible on upper FSM pages. FreeSpaceMapVacuumRange() - * vacuums the portion of the freespace map covering heap - * pages from start to end - 1. Include the block we just - * vacuumed by passing it blkno + 1. Overflow isn't an issue - * because MaxBlockNumber + 1 is InvalidBlockNumber which - * causes FreeSpaceMapVacuumRange() to vacuum freespace map - * pages covering the remainder of the relation. - */ - if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES) - { - FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, - blkno + 1); - next_fsm_block_to_vacuum = blkno + 1; - } - - continue; - } - - /* - * There was no call to lazy_vacuum_heap_page() because pruning - * didn't encounter/create any LP_DEAD items that needed to be - * vacuumed. Prune state has not been invalidated, so proceed - * with prunestate-driven visibility map and FSM steps (just like - * the two-pass strategy). - */ - Assert(dead_items->num_items == 0); - } - /* * Handle setting visibility map bit based on information from the VM * (as of last lazy_scan_skip() call), and from prunestate @@ -1200,38 +1141,45 @@ lazy_scan_heap(LVRelState *vacrel) /* * Final steps for block: drop cleanup lock, record free space in the - * FSM + * FSM. + * + * If we will likely do index vacuuming, wait until + * lazy_vacuum_heap_rel() to save free space. This doesn't just save + * us some cycles; it also allows us to record any additional free + * space that lazy_vacuum_heap_page() will make available in cases + * where it's possible to truncate the page's line pointer array. + * + * Note: It's not in fact 100% certain that we really will call + * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index + * vacuuming (and so must skip heap vacuuming). This is deemed okay + * because it only happens in emergencies, or when there is very + * little free space anyway. (Besides, we start recording free space + * in the FSM once index vacuuming has been abandoned.) */ - if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming) - { - /* - * Wait until lazy_vacuum_heap_rel() to save free space. This - * doesn't just save us some cycles; it also allows us to record - * any additional free space that lazy_vacuum_heap_page() will - * make available in cases where it's possible to truncate the - * page's line pointer array. - * - * Note: It's not in fact 100% certain that we really will call - * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip - * index vacuuming (and so must skip heap vacuuming). This is - * deemed okay because it only happens in emergencies, or when - * there is very little free space anyway. (Besides, we start - * recording free space in the FSM once index vacuuming has been - * abandoned.) - * - * Note: The one-pass (no indexes) case is only supposed to make - * it this far when there were no LP_DEAD items during pruning. - */ - Assert(vacrel->nindexes > 0); - UnlockReleaseBuffer(buf); - } - else + if (recordfreespace) { Size freespace = PageGetHeapFreeSpace(page); UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); } + else + UnlockReleaseBuffer(buf); + + /* + * Periodically perform FSM vacuuming to make newly-freed space + * visible on upper FSM pages. This is done after vacuuming if the + * table has indexes. + */ + if (vacrel->nindexes == 0 && + vacrel->tuples_deleted > tuples_already_deleted && + (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)) + { + FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, + blkno); + next_fsm_block_to_vacuum = blkno; + } + } vacrel->blkno = InvalidBlockNumber; @@ -1543,7 +1491,8 @@ lazy_scan_prune(LVRelState *vacrel, Buffer buf, BlockNumber blkno, Page page, - LVPagePruneState *prunestate) + LVPagePruneState *prunestate, + bool *recordfreespace) { Relation rel = vacrel->rel; OffsetNumber offnum, @@ -1587,7 +1536,8 @@ lazy_scan_prune(LVRelState *vacrel, * lpdead_items's final value can be thought of as the number of tuples * that were deleted from indexes. */ - heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum); + heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, + &presult, &vacrel->offnum); /* * Now scan the page to collect LP_DEAD items and check for tuples @@ -1598,6 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel, prunestate->all_visible = true; prunestate->all_frozen = true; prunestate->visibility_cutoff_xid = InvalidTransactionId; + *recordfreespace = false; for (offnum = FirstOffsetNumber; offnum <= maxoff; @@ -1918,6 +1869,15 @@ lazy_scan_prune(LVRelState *vacrel, vacrel->lpdead_items += lpdead_items; vacrel->live_tuples += live_tuples; vacrel->recently_dead_tuples += recently_dead_tuples; + + /* + * If we will not do index vacuuming, either because we have no indexes, + * because there is nothing to vacuum, or because do_index_vacuuming is + * false, make sure we update the freespace map. + */ + if (vacrel->nindexes == 0 || + !vacrel->do_index_vacuuming || lpdead_items == 0) + *recordfreespace = true; } /* @@ -2516,7 +2476,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, bool all_frozen; LVSavedErrInfo saved_err_info; - Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming); + Assert(vacrel->do_index_vacuuming); pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno); diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 932ec0d6f2b..4a2aebedb57 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -320,6 +320,7 @@ struct GlobalVisState; extern void heap_page_prune_opt(Relation relation, Buffer buffer); extern void heap_page_prune(Relation relation, Buffer buffer, struct GlobalVisState *vistest, + bool no_indexes, PruneResult *presult, OffsetNumber *off_loc); extern void heap_page_prune_execute(Buffer buffer, -- 2.37.2