On Tue, Nov 14, 2023 at 7:15 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote: > > > FreeSpaceMapVacuumRange()'s comment says: > > > * As above, but assume that only heap pages between start and end-1 > > > inclusive > > > * have new free-space information, so update only the upper-level slots > > > * covering that block range. end == InvalidBlockNumber is equivalent to > > > * "all the rest of the relation". > > > > > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the > > > "effects" > > > of the RecordPageWithFreeSpace() above it - which seems confusing. > > > > Ah, so shall I pass blkno + 1 as end? > > I think there's no actual overflow danger, because MaxBlockNumber + 1 is > InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the > intended block). Perhaps worth noting?
Attached > > > =# DELETE FROM copytest_0; > > > =# VACUUM (VERBOSE) copytest_0; > > > ... > > > INFO: 00000: table "copytest_0": truncated 146264 to 110934 pages > > > ... > > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock > > > contention > > > ... > > > > > > A bit of debugging later I figured out that this is due to the background > > > writer. If I SIGSTOP bgwriter, the whole relation is truncated... > > > > That's a bit sad. But isn't that what you would expect bgwriter to do? > > Write out dirty buffers? It doesn't know that those pages consist of > > only dead tuples and that you will soon truncate them. > > I think the issue more that it's feels wrong that a pin by bgwriter blocks > vacuum cleaning up. I think the same happens with on-access pruning - where > it's more likely for bgwriter to focus on those pages. Checkpointer likely > causes the same. Even normal backends can cause this while writing out the > page. > > It's not like bgwriter/checkpointer would actually have a problem if the page > were pruned - the page is locked while being written out and neither keep > pointers into the page after unlocking it again. > > > At this point I started to wonder if we should invent a separate type of pin > for a buffer undergoing IO. We basically have the necessary state already: > BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in > InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also > look at BM_IO_IN_PROGRESS. > > > Except of course that that'd not change anything - > ConditionalLockBufferForCleanup() locks the buffer conditionally, *before* > even looking at the refcount and returns false if not. And writing out a > buffer takes a content lock. Which made me realize that > "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock > contention" > > is often kinda wrong - the contention is *not* cleanup lock specific. It's > often just plain contention on the lwlock. Do you think we should change the error message? > Perhaps we should just treat IO_IN_PROGRESS buffers differently in > lazy_scan_heap() and heap_page_prune_opt() and wait? Hmm. This is an interesting idea. lazy_scan_heap() waiting for checkpointer to write out a buffer could have an interesting property of shifting who writes out the FPI. I wonder if it would matter. - Melanie
From b1af221d22cfcfbc63497d45c2d0f8ab28c720ab Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Mon, 13 Nov 2023 16:39:25 -0500 Subject: [PATCH v2] Release lock on heap buffer before vacuuming FSM When there are no indexes on a table, we vacuum each heap block after pruning it and then update the freespace map. Periodically, we also vacuum the freespace map. This was done while unnecessarily holding a lock on the heap page. Release the lock before calling FreeSpaceMapVacuumRange() and, while we're at it, ensure the range includes the heap block we just vacuumed. Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 30 +++++++++++++++++----------- 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 6985d299b2..59f51f40e1 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1046,18 +1046,6 @@ lazy_scan_heap(LVRelState *vacrel) /* Forget the LP_DEAD items that we just vacuumed */ dead_items->num_items = 0; - /* - * Periodically perform FSM vacuuming to make newly-freed - * space visible on upper FSM pages. Note we have not yet - * performed FSM processing for blkno. - */ - if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES) - { - FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, - blkno); - next_fsm_block_to_vacuum = blkno; - } - /* * Now perform FSM processing for blkno, and move on to next * page. @@ -1071,6 +1059,24 @@ lazy_scan_heap(LVRelState *vacrel) UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); + + /* + * Periodically perform FSM vacuuming to make newly-freed + * space visible on upper FSM pages. FreeSpaceMapVacuumRange() + * vacuums the portion of the freespace map covering heap + * pages from start to end - 1. Include the block we just + * vacuumed by passing it blkno + 1. Overflow isn't an issue + * because MaxBlockNumber + 1 is InvalidBlockNumber which + * causes FreeSpaceMapVacuumRange() to vacuum freespace map + * pages covering the remainder of the relation. + */ + if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES) + { + FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, + blkno + 1); + next_fsm_block_to_vacuum = blkno + 1; + } + continue; } -- 2.37.2