On 08/03/2024 19:34, Melanie Plageman wrote:
On Fri, Mar 08, 2024 at 06:07:33PM +0200, Heikki Linnakangas wrote:
On 08/03/2024 02:46, Melanie Plageman wrote:
On Wed, Mar 06, 2024 at 10:00:23PM -0500, Melanie Plageman wrote:
On Wed, Mar 06, 2024 at 09:55:21PM +0200, Heikki Linnakangas wrote:
However, by adding a vmbuffer to next_block_state, the callback may be
able to avoid extra VM fetches from one invocation to the next.
That's a good idea, holding separate VM buffer pins for the next-unskippable
block and the block we're processing. I adopted that approach.
Cool. It can't be avoided with streaming read vacuum, but I wonder if
there would ever be adverse effects to doing it on master? Maybe if we
are doing a lot of skipping and the block of the VM for the heap blocks
we are processing ends up changing each time but we would have had the
right block of the VM if we used the one from
heap_vac_scan_next_block()?
Frankly, I'm in favor of just doing it now because it makes
lazy_scan_heap() less confusing.
+1
I'm pretty happy with the attached patches now. The first one fixes the
existing bug I mentioned in the other email (based on the on-going
discussion that might not how we want to fix it though).
ISTM we should still do the fix you mentioned -- seems like it has more
upsides than downsides?
From b68cb29c547de3c4acd10f31aad47b453d154666 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Fri, 8 Mar 2024 16:00:22 +0200
Subject: [PATCH v8 1/3] Set all_visible_according_to_vm correctly with
DISABLE_PAGE_SKIPPING
It's important for 'all_visible_according_to_vm' to correctly reflect
whether the VM bit is set or not, even when we are not trusting the VM
to skip pages, because contrary to what the comment said,
lazy_scan_prune() relies on it.
If it's incorrectly set to 'false', when the VM bit is in fact set,
lazy_scan_prune() will try to set the VM bit again and dirty the page
unnecessarily. As a result, if you used DISABLE_PAGE_SKIPPING, all
heap pages were dirtied, even if there were no changes. We would also
fail to clear any VM bits that were set incorrectly.
This was broken in commit 980ae17310, so backpatch to v16.
LGTM.
Committed and backpatched this.
From 47af1ca65cf55ca876869b43bff47f9d43f0750e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Fri, 8 Mar 2024 17:32:19 +0200
Subject: [PATCH v8 2/3] Confine vacuum skip logic to lazy_scan_skip()
---
src/backend/access/heap/vacuumlazy.c | 256 +++++++++++++++------------
1 file changed, 141 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index ac55ebd2ae5..0aa08762015 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -204,6 +204,12 @@ typedef struct LVRelState
int64 live_tuples; /* # live tuples remaining */
int64 recently_dead_tuples; /* # dead, but not yet
removable */
int64 missed_dead_tuples; /* # removable, but not removed */
Perhaps we should add a comment to the blkno member of LVRelState
indicating that it is used for error reporting and logging?
Well, it's already under the "/* Error reporting state */" section. I
agree this is a little confusing, the name 'blkno' doesn't convey that
it's supposed to be used just for error reporting. But it's a
pre-existing issue so I left it alone. It can be changed with a separate
patch if we come up with a good idea.
I see why you removed my treatise-level comment that was here about
unskipped skippable blocks. However, when I was trying to understand
this code, I did wish there was some comment that explained to me why we
needed all of the variables next_unskippable_block,
next_unskippable_allvis, all_visible_according_to_vm, and current_block.
The idea that we would choose not to skip a skippable block because of
kernel readahead makes sense. The part that I had trouble wrapping my
head around was that we want to also keep the visibility status of both
the beginning and ending blocks of the skippable range and then use
those to infer the visibility status of the intervening blocks without
another VM lookup if we decide not to skip them.
Right, I removed the comment because looked a little out of place and it
duplicated the other comments sprinkled in the function. I agree this
could still use some more comments though.
Here's yet another attempt at making this more readable. I moved the
logic to find the next unskippable block to a separate function, and
added comments to make the states more explicit. What do you think?
--
Heikki Linnakangas
Neon (https://neon.tech)
From c21480e9da61e145573de3b502551dde1b8fa3f6 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Fri, 8 Mar 2024 17:32:19 +0200
Subject: [PATCH v9 1/2] Confine vacuum skip logic to lazy_scan_skip()
Rename lazy_scan_skip() to heap_vac_scan_next_block() and move more
code into the function, so that the caller doesn't need to know about
ranges or skipping anymore. heap_vac_scan_next_block() returns the
next block to process, and the logic for determining that block is all
within the function. This makes the skipping logic easier to
understand, as it's all in the same function, and makes the calling
code easier to understand as it's less cluttered. The state variables
needed to manage the skipping logic are moved to LVRelState.
heap_vac_scan_next_block() now manages its own VM buffer separately
from the caller's vmbuffer variable. The caller's vmbuffer holds the
VM page for the current block its processing, while
heap_vac_scan_next_block() keeps a pin on the VM page for the next
unskippable block. Most of the time they are the same, so we hold two
pins on the same buffer, but it's more convenient to manage them
separately.
For readability inside heap_vac_scan_next_block(), move the logic of
finding the next unskippable block to separate function, and add some
comments.
This refactoring will also help future patches to switch to using a
streaming read interface, and eventually AIO
(https://postgr.es/m/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com)
Author: Melanie Plageman, with some changes by me
Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 233 +++++++++++++++++----------
1 file changed, 146 insertions(+), 87 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ac55ebd2ae..1757eb49b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -204,6 +204,12 @@ typedef struct LVRelState
int64 live_tuples; /* # live tuples remaining */
int64 recently_dead_tuples; /* # dead, but not yet removable */
int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* State maintained by heap_vac_scan_next_block() */
+ BlockNumber current_block; /* last block returned */
+ BlockNumber next_unskippable_block; /* next unskippable block */
+ bool next_unskippable_allvis; /* its visibility status */
+ Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
} LVRelState;
/* Struct for saving and restoring vacuum error information. */
@@ -217,10 +223,9 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
-static BlockNumber lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer,
- BlockNumber next_block,
- bool *next_unskippable_allvis,
- bool *skipping_current_range);
+static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm);
+static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
@@ -803,12 +808,11 @@ lazy_scan_heap(LVRelState *vacrel)
{
BlockNumber rel_pages = vacrel->rel_pages,
blkno,
- next_unskippable_block,
next_fsm_block_to_vacuum = 0;
+ bool all_visible_according_to_vm;
+
VacDeadItems *dead_items = vacrel->dead_items;
Buffer vmbuffer = InvalidBuffer;
- bool next_unskippable_allvis,
- skipping_current_range;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -822,44 +826,19 @@ lazy_scan_heap(LVRelState *vacrel)
initprog_val[2] = dead_items->max_items;
pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
- /* Set up an initial range of skippable blocks using the visibility map */
- next_unskippable_block = lazy_scan_skip(vacrel, &vmbuffer, 0,
- &next_unskippable_allvis,
- &skipping_current_range);
- for (blkno = 0; blkno < rel_pages; blkno++)
+ /* Initialize for the first heap_vac_scan_next_block() call */
+ vacrel->current_block = InvalidBlockNumber;
+ vacrel->next_unskippable_block = InvalidBlockNumber;
+ vacrel->next_unskippable_allvis = false;
+ vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+
+ while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
{
Buffer buf;
Page page;
- bool all_visible_according_to_vm;
bool has_lpdead_items;
bool got_cleanup_lock = false;
- if (blkno == next_unskippable_block)
- {
- /*
- * Can't skip this page safely. Must scan the page. But
- * determine the next skippable range after the page first.
- */
- all_visible_according_to_vm = next_unskippable_allvis;
- next_unskippable_block = lazy_scan_skip(vacrel, &vmbuffer,
- blkno + 1,
- &next_unskippable_allvis,
- &skipping_current_range);
-
- Assert(next_unskippable_block >= blkno + 1);
- }
- else
- {
- /* Last page always scanned (may need to set nonempty_pages) */
- Assert(blkno < rel_pages - 1);
-
- if (skipping_current_range)
- continue;
-
- /* Current range is too small to skip -- just scan the page */
- all_visible_according_to_vm = true;
- }
-
vacrel->scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1077,18 +1056,22 @@ lazy_scan_heap(LVRelState *vacrel)
}
/*
- * lazy_scan_skip() -- set up range of skippable blocks using visibility map.
+ * heap_vac_scan_next_block() -- get next block for vacuum to process
*
- * lazy_scan_heap() calls here every time it needs to set up a new range of
- * blocks to skip via the visibility map. Caller passes the next block in
- * line. We return a next_unskippable_block for this range. When there are
- * no skippable blocks we just return caller's next_block. The all-visible
- * status of the returned block is set in *next_unskippable_allvis for caller,
- * too. Block usually won't be all-visible (since it's unskippable), but it
- * can be during aggressive VACUUMs (as well as in certain edge cases).
+ * lazy_scan_heap() calls here every time it needs to get the next block to
+ * prune and vacuum. The function uses the visibility map, vacuum options,
+ * and various thresholds to skip blocks which do not need to be processed and
+ * sets blkno to the next block that actually needs to be processed.
*
- * Sets *skipping_current_range to indicate if caller should skip this range.
- * Costs and benefits drive our decision. Very small ranges won't be skipped.
+ * The block number and visibility status of the next block to process are set
+ * in *blkno and *all_visible_according_to_vm. The return value is false if
+ * there are no further blocks to process.
+ *
+ * vacrel is an in/out parameter here; vacuum options and information about
+ * the relation are read, and vacrel->skippedallvis is set to ensure we don't
+ * advance relfrozenxid when we have skipped vacuuming all-visible blocks. It
+ * also holds information about the next unskippable block, as bookkeeping for
+ * this function.
*
* Note: our opinion of which blocks can be skipped can go stale immediately.
* It's okay if caller "misses" a page whose all-visible or all-frozen marking
@@ -1098,26 +1081,119 @@ lazy_scan_heap(LVRelState *vacrel)
* older XIDs/MXIDs. The vacrel->skippedallvis flag will be set here when the
* choice to skip such a range is actually made, making everything safe.)
*/
-static BlockNumber
-lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,
- bool *next_unskippable_allvis, bool *skipping_current_range)
+static bool
+heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- next_unskippable_block = next_block,
- nskippable_blocks = 0;
- bool skipsallvis = false;
+ BlockNumber next_block;
- *next_unskippable_allvis = true;
- while (next_unskippable_block < rel_pages)
+ /* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
+ next_block = vacrel->current_block + 1;
+
+ /* Have we reached the end of the relation? */
+ if (next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vacrel->next_unskippable_vmbuffer))
+ {
+ ReleaseBuffer(vacrel->next_unskippable_vmbuffer);
+ vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+ }
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
+
+ /*
+ * We must be in one of the three following states:
+ */
+ if (vacrel->next_unskippable_block == InvalidBlockNumber ||
+ next_block > vacrel->next_unskippable_block)
+ {
+ /*
+ * 1. We have just processed an unskippable block (or we're at the
+ * beginning of the scan). Find the next unskippable block using the
+ * visibility map.
+ */
+ bool skipsallvis;
+
+ find_next_unskippable_block(vacrel, &skipsallvis);
+
+ /*
+ * We now know the next block that we must process. It can be the
+ * next block after the one we just processed, or something further
+ * ahead. If it's further ahead, we can jump to it, but we choose to
+ * do so only if we can skip at least SKIP_PAGES_THRESHOLD consecutive
+ * pages. Since we're reading sequentially, the OS should be doing
+ * readahead for us, so there's no gain in skipping a page now and
+ * then. Skipping such a range might even discourage sequential
+ * detection.
+ *
+ * This test also enables more frequent relfrozenxid advancement
+ * during non-aggressive VACUUMs. If the range has any all-visible
+ * pages then skipping makes updating relfrozenxid unsafe, which is a
+ * real downside.
+ */
+ if (vacrel->next_unskippable_block - next_block >= SKIP_PAGES_THRESHOLD)
+ {
+ next_block = vacrel->next_unskippable_block;
+ if (skipsallvis)
+ vacrel->skippedallvis = true;
+ }
+ }
+
+ /* Now we must be in one of the two remaining states: */
+ if (next_block < vacrel->next_unskippable_block)
+ {
+ /*
+ * 2. We are processing a range of blocks that we could have skipped
+ * but chose not to. We know that they are all-visible in the VM,
+ * otherwise they would've been unskippable.
+ */
+ *blkno = vacrel->current_block = next_block;
+ *all_visible_according_to_vm = true;
+ return true;
+ }
+ else
+ {
+ /*
+ * 3. We reached the next unskippable block. Process it. On next
+ * iteration, we will be back in state 1.
+ */
+ Assert(next_block == vacrel->next_unskippable_block);
+
+ *blkno = vacrel->current_block = next_block;
+ *all_visible_according_to_vm = vacrel->next_unskippable_allvis;
+ return true;
+ }
+}
+
+/*
+ * Find the next unskippable block in a vacuum scan using the visibility map.
+ */
+static void
+find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
+{
+ BlockNumber rel_pages = vacrel->rel_pages;
+ BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
+ Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
+ bool next_unskippable_allvis;
+
+ *skipsallvis = false;
+
+ for (;;)
{
uint8 mapbits = visibilitymap_get_status(vacrel->rel,
next_unskippable_block,
- vmbuffer);
+ &next_unskippable_vmbuffer);
- if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if (!next_unskippable_allvis)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
- *next_unskippable_allvis = false;
break;
}
@@ -1152,34 +1228,17 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,
* All-visible block is safe to skip in non-aggressive case. But
* remember that the final range contains such a block for later.
*/
- skipsallvis = true;
+ *skipsallvis = true;
}
vacuum_delay_point();
next_unskippable_block++;
- nskippable_blocks++;
- }
-
- /*
- * We only skip a range with at least SKIP_PAGES_THRESHOLD consecutive
- * pages. Since we're reading sequentially, the OS should be doing
- * readahead for us, so there's no gain in skipping a page now and then.
- * Skipping such a range might even discourage sequential detection.
- *
- * This test also enables more frequent relfrozenxid advancement during
- * non-aggressive VACUUMs. If the range has any all-visible pages then
- * skipping makes updating relfrozenxid unsafe, which is a real downside.
- */
- if (nskippable_blocks < SKIP_PAGES_THRESHOLD)
- *skipping_current_range = false;
- else
- {
- *skipping_current_range = true;
- if (skipsallvis)
- vacrel->skippedallvis = true;
}
- return next_unskippable_block;
+ /* write the local variables back to vacrel */
+ vacrel->next_unskippable_block = next_unskippable_block;
+ vacrel->next_unskippable_allvis = next_unskippable_allvis;
+ vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
/*
@@ -1752,8 +1811,8 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Handle setting visibility map bit based on information from the VM (as
- * of last lazy_scan_skip() call), and from all_visible and all_frozen
- * variables
+ * of last heap_vac_scan_next_block() call), and from all_visible and
+ * all_frozen variables
*/
if (!all_visible_according_to_vm && all_visible)
{
@@ -1788,8 +1847,8 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
* page-level bit is clear. However, it's possible that the bit got
- * cleared after lazy_scan_skip() was called, so we must recheck with
- * buffer lock before concluding that the VM is corrupt.
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
*/
else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
--
2.39.2
From 07437212d629ab00b571dfcadb41497a6c5b43d5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 11 Mar 2024 10:01:37 +0200
Subject: [PATCH v9 2/2] Remove unneeded vacuum_delay_point from
heap_vac_scan_get_next_block
heap_vac_scan_get_next_block() does relatively little work, so there
is no need to call vacuum_delay_point(). A future commit will call
heap_vac_scan_get_next_block() from a callback, and we would like to
avoid calling vacuum_delay_point() in that callback.
Author: Melanie Plageman
Discussion: https://postgr.es/m/CAAKRu_Yf3gvXGcCnqqfoq0Q8LX8UM-e-qbm_B1LeZh60f8WhWA%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1757eb49b7..58ee12fdfb 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1231,7 +1231,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
*skipsallvis = true;
}
- vacuum_delay_point();
next_unskippable_block++;
}
--
2.39.2