On Tue, Nov 14, 2023 at 7:15 PM Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote:
> > > FreeSpaceMapVacuumRange()'s comment says:
> > >  * As above, but assume that only heap pages between start and end-1 
> > > inclusive
> > >  * have new free-space information, so update only the upper-level slots
> > >  * covering that block range.  end == InvalidBlockNumber is equivalent to
> > >  * "all the rest of the relation".
> > >
> > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the 
> > > "effects"
> > > of the RecordPageWithFreeSpace() above it - which seems confusing.
> >
> > Ah, so shall I pass blkno + 1 as end?
>
> I think there's no actual overflow danger, because MaxBlockNumber + 1 is
> InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the
> intended block). Perhaps worth noting?

Attached

> > > =# DELETE FROM copytest_0;
> > > =# VACUUM (VERBOSE) copytest_0;
> > > ...
> > > INFO:  00000: table "copytest_0": truncated 146264 to 110934 pages
> > > ...
> > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock 
> > > contention
> > > ...
> > >
> > > A bit of debugging later I figured out that this is due to the background
> > > writer. If I SIGSTOP bgwriter, the whole relation is truncated...
> >
> > That's a bit sad. But isn't that what you would expect bgwriter to do?
> > Write out dirty buffers? It doesn't know that those pages consist of
> > only dead tuples and that you will soon truncate them.
>
> I think the issue more that it's feels wrong that a pin by bgwriter blocks
> vacuum cleaning up. I think the same happens with on-access pruning - where
> it's more likely for bgwriter to focus on those pages. Checkpointer likely
> causes the same. Even normal backends can cause this while writing out the
> page.
>
> It's not like bgwriter/checkpointer would actually have a problem if the page
> were pruned - the page is locked while being written out and neither keep
> pointers into the page after unlocking it again.
>
>
> At this point I started to wonder if we should invent a separate type of pin
> for a buffer undergoing IO. We basically have the necessary state already:
> BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in
> InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also
> look at BM_IO_IN_PROGRESS.
>
>
> Except of course that that'd not change anything -
> ConditionalLockBufferForCleanup() locks the buffer conditionally, *before*
> even looking at the refcount and returns false if not.  And writing out a
> buffer takes a content lock.  Which made me realize that
>   "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock 
> contention"
>
> is often kinda wrong - the contention is *not* cleanup lock specific. It's
> often just plain contention on the lwlock.

Do you think we should change the error message?

> Perhaps we should just treat IO_IN_PROGRESS buffers differently in
> lazy_scan_heap() and heap_page_prune_opt() and wait?

Hmm. This is an interesting idea. lazy_scan_heap() waiting for
checkpointer to write out a buffer could have an interesting property
of shifting who writes out the FPI. I wonder if it would matter.

- Melanie
From b1af221d22cfcfbc63497d45c2d0f8ab28c720ab Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 13 Nov 2023 16:39:25 -0500
Subject: [PATCH v2] Release lock on heap buffer before vacuuming FSM

When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Release the lock before calling
FreeSpaceMapVacuumRange() and, while we're at it, ensure the range
includes the heap block we just vacuumed.

Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 30 +++++++++++++++++-----------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6985d299b2..59f51f40e1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1046,18 +1046,6 @@ lazy_scan_heap(LVRelState *vacrel)
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
 
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages.  Note we have not yet
-				 * performed FSM processing for blkno.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno);
-					next_fsm_block_to_vacuum = blkno;
-				}
-
 				/*
 				 * Now perform FSM processing for blkno, and move on to next
 				 * page.
@@ -1071,6 +1059,24 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				UnlockReleaseBuffer(buf);
 				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
+				/*
+				 * Periodically perform FSM vacuuming to make newly-freed
+				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
+				 * vacuums the portion of the freespace map covering heap
+				 * pages from start to end - 1. Include the block we just
+				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
+				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
+				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
+				 * pages covering the remainder of the relation.
+				 */
+				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+				{
+					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+											blkno + 1);
+					next_fsm_block_to_vacuum = blkno + 1;
+				}
+
 				continue;
 			}
 
-- 
2.37.2

Reply via email to