Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Melanie Plageman Mon, 23 Dec 2024 09:50:42 -0800

Updated v4 attached.

On Wed, Dec 18, 2024 at 4:30 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Tue, Dec 17, 2024 at 5:54 PM Melanie Plageman
> <melanieplage...@gmail.com> wrote:
> > Makes sense. I've attempted to clarify as you suggest in v3.
>
> I would just commit 0001. There's nothing to be gained by waiting around.


Done.

> I don't care about 0002 much. It doesn't seem particularly better or
> worse. I would suggest that if you want to do this, maybe don't split
> up this:
>
>     vacrel->blkno = InvalidBlockNumber;
>     if (BufferIsValid(vmbuffer))
>         ReleaseBuffer(vmbuffer);
>
> You could put the moved code before or after after that instead of the
> middle of it. Unless there's some reason it makes more sense in the
> middle.

Good point. Anyway, I've dropped all the patches except 0006 and 0007
since no one seems to really like them.

> I dislike 0004 as presented. Reducing the scope of blkno is a false
> economy; the function doesn't do much of anything interesting after
> the main loop. And why do we want to report rel_pages to the progress
> reporting machinery instead of blkno? I'd rather that the code report
> where it actually ended up (blkno) rather than reporting where it
> thinks it must have ended up (rel_pages).

Well, part of the reason I didn't like it is that because we start
from 0, we have to artificially set blkno to rel_pages anyway because
we never actually scan a block with BlockNumber == rel_pages. In the
loop, pgstat_progress_update_param() passes blkno before it actually
scans blkno, so I think of it as reporting that it had scanned a total
number of blocks equal to blkno.  That's why it seems weird to me that
we use blkno to indicate the number of blocks scanned outside the
loop.

Anyway, I'm fine with setting this aside for now if you feel it is
more confusing with rel_pages instead of blkno.

> I agree that the assert that 0005 replaces is confusing, but replacing
> 8 lines of code with 37 is not an improvement in my book.

Right, yes. It is long. I dropped it along with the others.

> I like 0006. The phase I-II-III terminology doesn't appear anywhere in
> the code (at least, not to my knowledge) but we speak of it that way
> so frequently in email and in conversation that mentioning it
> someplace that a future developer might find seems advisable. I think
> it would be worth mentioning in the second paragraph of the comment
> that we may resume phase I after completing phase III if we entered
> phase II due to lack of memory. I'm not sure what the right way to
> phrase this is -- in a sense, this possibility means that these aren't
> really phases at all, however much we may speak of them that way. But
> as I say, we talk about them that way all the time, so it's good that
> this is finally adding comments to match.

They are more like states in a state machine, I suppose. We've all
been calling them phases for so long though, it's probably best to go
with that. I've added more about how we can return to phases. I also
added a small note about how this makes them more like states but
we've always called them phases.

> Regarding 0007:
>
> - Why do we measure failure as an integer total but success as a
> percentage? Maybe the thought here is that failures are counted per
> region but successes are counted globally across the relation, but
> then I wonder if that's what we want, and if so, whether we need some
> comments to explain why we want that rather than doing both per
> region.

I've added a comment about this to the top of the file. The idea is
that if you don't cap successes, you won't end up amortizing anything.
However, you don't want to limit yourself from freezing the data at
the beginning of the table if you are succeeding. Especially given
that append-mostly workloads will see the most benefit from this
feature.

I did wonder if we should also have some sort of global failure limit
to cap the total pages scanned, but I wondered if that was too much
extra complexity to have a global and local failure limit (also
unclear what to set the global failure limit to). It's probably better
to make the fails per region configurable.

> - I do not like the nested struct definition for eager_pages. Either
> define the struct separately, give it a name, and then refer to that
> name here, or just define the elements individually, and maybe give
> them a common prefix or something. I don't think what you have here is
> a style we normally use. As you can see, pgindent is not particularly
> fond of it. It seems particularly weird given that you didn't even put
> all the stuff related to eagerness inside of it.

I don't mind changing it. This version has them prefixed with "eager" instead.

> - It's a little weird that you mostly treat eager vacuums as an
> intermediate position between aggressive and normal, but then we
> decide whether we're eager in a different place than we decide whether
> we're aggressive.

Yes, originally I had the code in heap_vacuum_set_up_eagerness() in
vacuum_get_cutoffs() but I moved it after noticing
vacuum_get_cutoffs() was in vacuum.c which is technically AM-agnostic.
I didn't want to pollute it too much with the eagerness logic which is
actually used in heap-specific code.

With the other changes to eliminate the idea of a separate eager
vacuum in this version, I think this is no longer an issue.

> - On a related note, won't most vacuums be VAC_EAGER rather than
> VAC_NORMAL, thus making VAC_NORMAL a misnomer? I wonder if it's better
> to merge eager and normal together, and just treat the cases where we
> judge eager scanning not worthwhile as a mild special case of an
> otherwise-normal vacuum. It's important that a user can tell what
> happened from the log message, but it doesn't seem absolutely
> necessary for the start-of-vacuum message to make that clear. It could
> just be that %u all-visible scanned => 0 all-visible scanned means we
> didn't end up being at all eager.

I can see this. In this version, I've eliminated the concept of eager
vacuums and reverted the LVRelState flag back to a boolean indicating
aggressive or non-aggressive. Normal vacuums may eager scan some pages
if they qualify. If so, the eager scan management state is set up in
the LVRelState. Aggressive vacuums and normal vacuums with eager
scanning disabled have all of this state set to values indicating
eager scanning is disabled.

> - Perhaps it's unfair of me, but I think I would have hoped for an
> acknowledgement in this commit message, considering that I believe I
> was the one who suggested breaking the relation into logical regions,
> trying to freeze a percentage of the all-visible-but-not-all-frozen
> pages, and capping both successes and failures. Starting the region at
> a random offset wasn't my idea, and the specific thresholds you've
> chosen were not the ones I suggested, and limiting successes globally
> rather than per-region was not what I think I had in mind, and I don't
> mean to take away from everything you've done to move this forward,
> but unless I am misunderstanding the situation, this particular patch
> (0007) is basically an implementation of an algorithm that, as far as
> I know, I was the first to propose.

Ah, you're right. This was an oversight that I believe I've corrected
in the attached version's commit message.

> - Which of course also means that I tend to like the idea, but also
> that I'm biased. Still, what is the reasonable alternative to this
> patch? I have a hard time believing that it's "do nothing". As far as
> I know, pretty much everyone agrees that the large burst of work that
> tends to occur when an aggressive vacuum kicks off is extremely
> problematic, particularly but not only the first time it kicks off on
> a table or group of tables that may have accumulated many
> all-visible-but-not-all-frozen pages. This patch might conceivably err
> in moving that work too aggressively to earlier vacuums, thus making
> those vacuums too expensive or wasting work if the pages end up being
> modified again; or it might conceivably err in moving work
> insufficiently aggressively to earlier vacuums, leaving too much
> remaining work when the aggressive vacuum finally happens. In fact, I
> would be surprised if it doesn't have those problems in some
> situations. But it could have moderately severe cases of those
> problems and still be quite a bit better than what we have now
> overall.
>
> - So,I think we should avoid fine-tuning this and try to understand if
> there's anything wrong with the big picture. Can we imagine a user who
> is systematically unhappy with this change? Like, not a user who got
> hosed once because of some bad luck, but someone who is constantly and
> predictably getting hosed? They'd need to be accumulating lots of
> all-visible-not-all-frozen pages in relatively large tables on a
> regular basis, but then I guess they'd need to either drop the tables
> before the aggressive vacuum happened, or they'd need to render the
> pages not-all-visible again before the aggressive vacuum would have
> happened. I'm not entirely sure how possible that is. My best guess is
> that it's possible if the timing of your autovacuum runs is
> particularly poor -- you just need to load some data, vacuum it early
> enough that the XIDs are still young so it doesn't get frozen, then
> have the eager vacuum hit it, and then update it. That doesn't seem
> impossible, but I'm not sure if it's possible to make it happen often
> enough and severely enough to really cause a problem. And I'm not sure
> we're going to find that out before this is committed

I suppose in the worst case, if the timings all align poorly and
you've set your autovacuum_freeze_max_age/vacuum_freeze_table_age very
high and vacuum_freeze_min_age very low, you could end up uselessly
freezing the same page multiple times before aggressive vacuuming.

If you cycle through modifying a page, vacuuming it, setting it
all-visible, and eagerly scanning and freezing it multiple times
before an aggressive vacuum, this would be a lot of extra useless
freezing. It seems difficult to do because the page will likely be
frozen the first time you vacuum it if vacuum_freeze_min_age is set
sufficiently low.

The other "worst case" is just that you always scan and fail to freeze
an extra 3% of the relation while vacuuming the table. This one is
much easier to achieve. As such, it seems worthwhile to add a GUC and
table option to tune the EAGER_SCAN_MAX_FAILS_PER_REGION such that you
can disable eager scanning altogether (or increase or decrease how
aggressive it is).

- Melanie

From acd01bb8e7d77c076dbfbbcee8ea57da421228ae Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Wed, 11 Dec 2024 14:13:34 -0500
Subject: [PATCH v4 1/2] Add more general summary to vacuumlazy.c

Add more comments at the top of vacuumlazy.c on heap relation vacuuming
implementation.

Previously vacuumlazy.c only had details related to the dead TID storage
added in Postgres 17. This commit adds a more general summary to help
future developers understand the heap relation vacuum design and
implementation at a high level.

Reviewed-by: Robert Haas, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f2ca9430581..f8edbab21d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3,6 +3,48 @@
  * vacuumlazy.c
  *	  Concurrent ("lazy") vacuuming.
  *
+ * Heap relations are vacuumed in three main phases. In phase I, vacuum scans
+ * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in
+ * a TID store. If that TID store fills up or vacuum finishes scanning the
+ * relation, it progresses to phase II: index vacuuming. Index vacuuming
+ * deletes the dead index entries referenced in the TID store. In phase III,
+ * vacuum scans the blocks of the relation indicated by the TIDs in the TID
+ * store and reaps the dead tuples, freeing that space for future tuples.
+ *
+ * If there are no indexes or index scanning is disabled, phase II may be
+ * skipped. If phase I identified very few dead index entries, vacuum may skip
+ * phases II and III. If the TID store fills up in phase I, vacuum suspends
+ * phase I, proceeds to phases II and II and cleans up the dead tuples
+ * referenced in the current TID store. This empties the TID store and allows
+ * vacuum to resume phase I. In this sense, the phases are more like states in
+ * a state machine, but they have been referred to colloquially as phases for
+ * long enough that it makes sense to refer to them in that way here.
+ *
+ * Finally, vacuum may truncate the relation if it has emptied pages at the
+ * end. After finishing all phases of work, vacuum updates relation statistics
+ * in pg_class and the cumulative statistics subsystem.
+ *
+ * Relation Scanning:
+ *
+ * Vacuum scans the heap relation, starting at the beginning and progressing
+ * to the end, skipping pages as permitted by their visibility status, vacuum
+ * options, and the eagerness level of the vacuum.
+ *
+ * When page skipping is enabled, non-aggressive vacuums may skip scanning
+ * pages that are marked all-visible in the visibility map. We may choose not
+ * to skip pages if the range of skippable pages is below
+ * SKIP_PAGES_THRESHOLD.
+ *
+ * Once vacuum has decided to scan a given block, it must read in the block
+ * and obtain a cleanup lock to prune tuples on the page. A non-aggressive
+ * vacuum may choose to skip pruning and freezing if it cannot acquire a
+ * cleanup lock on the buffer right away.
+ *
+ * After pruning and freezing, pages that are newly all-visible and all-frozen
+ * are marked as such in the visibility map.
+ *
+ * Dead TID Storage:
+ *
  * The major space usage for vacuuming is storage for the dead tuple IDs that
  * are to be removed from indexes.  We want to ensure we can vacuum even the
  * very largest relations with finite memory space usage.  To do that, we set
-- 
2.34.1

From 925dd0e487ee0a7910370810e0167cc4efa198e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Thu, 19 Dec 2024 19:42:54 -0500
Subject: [PATCH v4 2/2] Eagerly scan all-visible pages to amortize aggressive
 vacuum

Introduce eager scanning normal vacuums, in which vacuum scans some of
the all-visible but not all-frozen pages in the relation to amortize the
cost of an aggressive vacuum.

Because the goal is to freeze these all-visible pages, all-visible pages
that are eagerly scanned and set all-frozen in the visibility map are
considered successful eager scans and those not frozen are considered
failed eager scans.

If too many eager scans fail in a row, eager scanning is temporarily
suspended until a later portion of the relation. To effectively amortize
aggressive vacuums, we cap the number of successes as well. Once we
reach the maximum number of blocks successfully eager scanned and
frozen, eager scanning is permanently disabled for the current vacuum.

Original design idea from Robert Haas, with enhancements from
Andres Freund, Tomas Vondra, and me

Author: Melanie Plageman
Reviewed-by: Andres Freund, Robert Haas, Robert Treat, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 379 +++++++++++++++++++++++++--
 1 file changed, 352 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f8edbab21d2..0318650a2bf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -30,10 +30,47 @@
  * to the end, skipping pages as permitted by their visibility status, vacuum
  * options, and the eagerness level of the vacuum.
  *
- * When page skipping is enabled, non-aggressive vacuums may skip scanning
- * pages that are marked all-visible in the visibility map. We may choose not
- * to skip pages if the range of skippable pages is below
- * SKIP_PAGES_THRESHOLD.
+ * Vacuums are either aggressive or normal. Aggressive vacuums must scan every
+ * unfrozen tuple in order to advance relfrozenxid and avoid transaction ID
+ * wraparound. Normal vacuums may eagerly scan otherwise skippable pages for
+ * one of two reasons:
+ *
+ * When page skipping is not disabled, a normal vacuum may skip scanning pages
+ * that are marked all-visible (and even all-frozen) in the visibility map if
+ * the range of skippable pages is below SKIP_PAGES_THRESHOLD. This is
+ * primarily for the benefit of kernel readahead (see comment in
+ * heap_vac_scan_next_block()).
+ *
+ * A normal vacuum may also scan skippable pages in an effort to freeze them
+ * and decrease the backlog of all-visible but not all-frozen pages that have
+ * to be processed by the next aggressive vacuum. These are referred to as
+ * eagerly scanned pages. Pages scanned due to SKIP_PAGES_THRESHOLD do not
+ * count as eagerly scanned pages.
+ *
+ * Normal vacuums count all-visible pages eagerly scanned as a success when
+ * they are able to set them all-frozen in the VM and as a failure when they
+ * are not able to set them all-frozen.
+ *
+ * Because we want to amortize the overhead of freezing pages over multiple
+ * vacuums, normal vacuums cap the number of successful eager scans to
+ * EAGER_SCAN_SUCCESS_RATE of the number of all-visible but not all-frozen
+ * pages at the beginning of the vacuum. Once the success cap has been hit,
+ * eager scanning is permanently disabled.
+ *
+ * Success is a global cap because we don't want to limit our successes if old
+ * data happens to be concentrated in a particular part of the table. This is
+ * especially likely to happen for append-mostly workloads where the oldest
+ * data is at the beginning of the unfrozen portion of the relation.
+ *
+ * On the assumption that different regions of the table are likely to contain
+ * similarly aged data, normal vacuums use a localized eager scan failure cap
+ * instead of a global cap for the whole relation. The failure count is reset
+ * for each region of the table -- comprised of EAGER_SCAN_REGION_SIZE blocks.
+ * In each region, we tolerate EAGER_SCAN_MAX_FAILS_PER_REGION before
+ * suspending eager scanning until the end of the region.
+ *
+ * Aggressive vacuums must examine every unfrozen tuple and are thus not
+ * subject to failure or success caps when eagerly scanning all-visible pages.
  *
  * Once vacuum has decided to scan a given block, it must read in the block
  * and obtain a cleanup lock to prune tuples on the page. A non-aggressive
@@ -88,6 +125,7 @@
 #include "commands/progress.h"
 #include "commands/vacuum.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -173,6 +211,29 @@ typedef enum
 	VACUUM_ERRCB_PHASE_TRUNCATE,
 } VacErrPhase;
 
+/*
+ * Normal vacuums may eagerly scan some all-visible but not all-frozen pages.
+ * Since our goal is to freeze these pages, an eager scan that fails to set
+ * the page all-frozen in the VM is considered to have "failed".
+ *
+ * On the assumption that different regions of the table tend to have
+ * similarly aged data, once we fail to freeze EAGER_SCAN_MAX_FAILS_PER_REGION
+ * blocks in a region of size EAGER_SCAN_REGION_SIZE, we suspend eager
+ * scanning until vacuum has progressed to another region of the table with
+ * potentially older data.
+ */
+#define EAGER_SCAN_REGION_SIZE 4096
+#define EAGER_SCAN_MAX_FAILS_PER_REGION 128
+
+/*
+ * An eager scan of a page that is set all-frozen in the VM is considered
+ * "successful". To spread out eager scanning across multiple normal vacuums,
+ * we limit the number of successful eager page scans. The maximum number of
+ * successful eager page scans is calculated as a ratio of the all-visible but
+ * not all-frozen pages at the beginning of the vacuum.
+ */
+#define EAGER_SCAN_SUCCESS_RATE 0.2
+
 typedef struct LVRelState
 {
 	/* Target heap relation and its indexes */
@@ -229,6 +290,13 @@ typedef struct LVRelState
 
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* # pages examined (not skipped via VM) */
+
+	/*
+	 * Count of all-visible blocks eagerly scanned (for logging only). This
+	 * does not include skippable blocks scanned due to SKIP_PAGES_THRESHOLD.
+	 */
+	BlockNumber eager_scanned_pages;
+
 	BlockNumber removed_pages;	/* # pages removed by relation truncation */
 	BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
 
@@ -270,9 +338,46 @@ typedef struct LVRelState
 	BlockNumber current_block;	/* last block returned */
 	BlockNumber next_unskippable_block; /* next unskippable block */
 	bool		next_unskippable_allvis;	/* its visibility status */
+	bool		next_unskippable_eager_scanned; /* if it was eager scanned */
 	Buffer		next_unskippable_vmbuffer;	/* buffer containing its VM bit */
+
+	/* State related to managing eager scanning of all-visible pages */
+
+	/*
+	 * A normal vacuum that has failed to freeze too many eagerly scanned
+	 * blocks in a row suspends eager scanning. next_eager_scan_region_start
+	 * is the block number of the first block eligible for resumed eager
+	 * scanning.
+	 *
+	 * When eager scanning is permanently disabled, either initially
+	 * (including for aggressive vacuum) or due to hitting the success limit,
+	 * this is set to InvalidBlockNumber.
+	 */
+	BlockNumber next_eager_scan_region_start;
+
+	/*
+	 * The remaining number of blocks a normal vacuum will consider eager
+	 * scanning. When eager scanning is enabled, this is initialized to
+	 * EAGER_SCAN_SUCCESS_RATE of the total number of all-visible but not
+	 * all-frozen pages. For each eager scan success, this is decremented.
+	 * Once it hits 0, eager scanning is permanently disabled. It is
+	 * initialized to 0 if eager scanning starts out disabled (including for
+	 * aggressive vacuum).
+	 */
+	BlockNumber eager_scan_remaining_successes;
+
+	/*
+	 * The number of eagerly scanned blocks vacuum failed to freeze (due to
+	 * age) in the current eager scan region. Vacuum resets it to
+	 * EAGER_SCAN_MAX_FAILS_PER_REGION each time it enters a new region of the
+	 * relation. If eager_scan_remaining_fails hits 0, eager scanning is
+	 * suspended until the next region. It is also 0 if eager scanning has
+	 * been permanently disabled.
+	 */
+	BlockNumber eager_scan_remaining_fails;
 } LVRelState;
 
+
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -284,8 +389,10 @@ typedef struct LVSavedErrInfo
 
 /* non-export function prototypes */
 static void lazy_scan_heap(LVRelState *vacrel);
+static void heap_vacuum_eager_scan_setup(LVRelState *vacrel);
 static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
-									 bool *all_visible_according_to_vm);
+									 bool *all_visible_according_to_vm,
+									 bool *was_eager_scanned);
 static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
@@ -293,7 +400,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							bool *has_lpdead_items);
+							bool *has_lpdead_items, bool *vm_page_frozen);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *has_lpdead_items);
@@ -335,6 +442,113 @@ static void restore_vacuum_error_info(LVRelState *vacrel,
 									  const LVSavedErrInfo *saved_vacrel);
 
 
+
+/*
+ * Helper to set up the eager scanning state for vacuuming a single relation.
+ * Initializes the eager scan management related members of the LVRelState.
+ *
+ * Caller provides whether or not an aggressive vacuum is required due to
+ * vacuum options or for relfrozenxid/relminmxid advancement.
+ */
+static void
+heap_vacuum_eager_scan_setup(LVRelState *vacrel)
+{
+	uint32		randseed;
+	BlockNumber allvisible;
+	BlockNumber allfrozen;
+	float		first_region_ratio;
+	bool		oldest_unfrozen_requires_freeze = false;
+
+	/*
+	 * Initialize eager scan management fields to their disabled values.
+	 * Aggressive vacuums, normal vacuums of small tables, and normal vacuums
+	 * of tables without old enough tuples will have eager scanning disabled.
+	 */
+	vacrel->next_eager_scan_region_start = InvalidBlockNumber;
+	vacrel->eager_scan_remaining_fails = 0;
+	vacrel->eager_scan_remaining_successes = 0;
+
+	/*
+	 * The caller will have determined whether or not an aggressive vacuum is
+	 * required by either the vacuum parameters or the relative age of the
+	 * oldest unfrozen transaction IDs. An aggressive vacuum must scan every
+	 * all-visible page to safely advance the relfrozenxid and/or relminmxid,
+	 * so scanning all-visible pages is not considered eager.
+	 */
+	if (vacrel->aggressive)
+		return;
+
+	/*
+	 * If the relation is smaller than a single region, we won't bother eager
+	 * scanning it, as a future aggressive vacuum shouldn't take very long
+	 * anyway so there is no point in amortization.
+	 */
+	if (vacrel->rel_pages < EAGER_SCAN_REGION_SIZE)
+		return;
+
+	/*
+	 * We only want to enable eager scanning if we are likely to be able to
+	 * freeze some of the pages in the relation. We are only guaranteed to
+	 * freeze a freezable page if some of the tuples require freezing. Tuples
+	 * require freezing if any of their xids precede the freeze limit or
+	 * multixact cutoff. So, if the oldest unfrozen xid
+	 * (relfrozenxid/relminmxid) does not precede the freeze cutoff, we won't
+	 * find tuples requiring freezing.
+	 */
+	if (TransactionIdIsNormal(vacrel->cutoffs.relfrozenxid) &&
+		TransactionIdPrecedesOrEquals(vacrel->cutoffs.relfrozenxid,
+									  vacrel->cutoffs.FreezeLimit))
+		oldest_unfrozen_requires_freeze = true;
+
+	if (!oldest_unfrozen_requires_freeze &&
+		MultiXactIdIsValid(vacrel->cutoffs.relminmxid) &&
+		MultiXactIdPrecedesOrEquals(vacrel->cutoffs.relminmxid,
+									vacrel->cutoffs.MultiXactCutoff))
+		oldest_unfrozen_requires_freeze = true;
+
+	if (!oldest_unfrozen_requires_freeze)
+		return;
+
+	/*
+	 * We are not required to do an aggressive vacuum and we have met the
+	 * criteria to eagerly scan some pages.
+	 */
+
+	/*
+	 * Our success cap is EAGER_SCAN_SUCCESS_RATE of the number of all-visible
+	 * but not all-frozen blocks in the relation.
+	 */
+	visibilitymap_count(vacrel->rel, &allvisible, &allfrozen);
+
+	vacrel->eager_scan_remaining_successes =
+		(BlockNumber) (EAGER_SCAN_SUCCESS_RATE *
+					   (allvisible - allfrozen));
+
+	/* If the table is entirely frozen, eager scanning is disabled. */
+	if (vacrel->eager_scan_remaining_successes == 0)
+		return;
+
+	/*
+	 * Now calculate the eager scan start block. Start at a random spot
+	 * somewhere within the first eager scan region. This avoids eager
+	 * scanning and failing to freeze the exact same blocks each vacuum of the
+	 * relation.
+	 */
+	randseed = pg_prng_uint32(&pg_global_prng_state);
+
+	vacrel->next_eager_scan_region_start = randseed % EAGER_SCAN_REGION_SIZE;
+
+	/*
+	 * The first region will be smaller than subsequent regions. As such,
+	 * adjust the eager scan failures tolerated for this region.
+	 */
+	first_region_ratio = 1 - (float) vacrel->next_eager_scan_region_start /
+		EAGER_SCAN_REGION_SIZE;
+
+	vacrel->eager_scan_remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION *
+		first_region_ratio;
+}
+
 /*
  *	heap_vacuum_rel() -- perform VACUUM for one heap relation
  *
@@ -463,6 +677,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 
 	/* Initialize page counters explicitly (be tidy) */
 	vacrel->scanned_pages = 0;
+	vacrel->eager_scanned_pages = 0;
 	vacrel->removed_pages = 0;
 	vacrel->new_frozen_tuple_pages = 0;
 	vacrel->lpdead_item_pages = 0;
@@ -488,6 +703,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->vm_new_visible_pages = 0;
 	vacrel->vm_new_visible_frozen_pages = 0;
 	vacrel->vm_new_frozen_pages = 0;
+	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
 
 	/*
 	 * Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -506,11 +722,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * to increase the number of dead tuples it can prune away.)
 	 */
 	vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
-	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
 	vacrel->vistest = GlobalVisTestFor(rel);
 	/* Initialize state used to track oldest extant XID/MXID */
 	vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
 	vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+
+	/*
+	 * Initialize state related to tracking all-visible page skipping. This is
+	 * very important to determine whether or not it is safe to advance the
+	 * relfrozenxid.
+	 */
 	vacrel->skippedallvis = false;
 	skipwithvm = true;
 	if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
@@ -525,6 +746,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 
 	vacrel->skipwithvm = skipwithvm;
 
+	/*
+	 * Next set up eager scan tracking state. This must happen after
+	 * determining whether or not the vacuum must be aggressive, because only
+	 * normal vacuums are considered to eagerly scan pages.
+	 */
+	heap_vacuum_eager_scan_setup(vacrel);
+
 	if (verbose)
 	{
 		if (vacrel->aggressive)
@@ -719,12 +947,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							 vacrel->relnamespace,
 							 vacrel->relname,
 							 vacrel->num_index_scans);
-			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
+			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u eager scanned\n"),
 							 vacrel->removed_pages,
 							 new_rel_pages,
 							 vacrel->scanned_pages,
 							 orig_rel_pages == 0 ? 100.0 :
-							 100.0 * vacrel->scanned_pages / orig_rel_pages);
+							 100.0 * vacrel->scanned_pages /
+							 orig_rel_pages,
+							 vacrel->eager_scanned_pages);
 			appendStringInfo(&buf,
 							 _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
 							 (long long) vacrel->tuples_deleted,
@@ -895,8 +1125,10 @@ lazy_scan_heap(LVRelState *vacrel)
 	BlockNumber rel_pages = vacrel->rel_pages,
 				blkno,
 				next_fsm_block_to_vacuum = 0;
-	bool		all_visible_according_to_vm;
-
+	bool		all_visible_according_to_vm,
+				was_eager_scanned = false;
+	BlockNumber orig_eager_scan_success_limit =
+		vacrel->eager_scan_remaining_successes; /* for logging */
 	Buffer		vmbuffer = InvalidBuffer;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
@@ -915,13 +1147,16 @@ lazy_scan_heap(LVRelState *vacrel)
 	vacrel->current_block = InvalidBlockNumber;
 	vacrel->next_unskippable_block = InvalidBlockNumber;
 	vacrel->next_unskippable_allvis = false;
+	vacrel->next_unskippable_eager_scanned = false;
 	vacrel->next_unskippable_vmbuffer = InvalidBuffer;
 
-	while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
+	while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm,
+									&was_eager_scanned))
 	{
 		Buffer		buf;
 		Page		page;
 		bool		has_lpdead_items;
+		bool		vm_page_frozen = false;
 		bool		got_cleanup_lock = false;
 
 		vacrel->scanned_pages++;
@@ -1049,7 +1284,48 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (got_cleanup_lock)
 			lazy_scan_prune(vacrel, buf, blkno, page,
 							vmbuffer, all_visible_according_to_vm,
-							&has_lpdead_items);
+							&has_lpdead_items, &vm_page_frozen);
+
+		/*
+		 * Count an eagerly scanned page as a failure or a success.
+		 */
+		if (was_eager_scanned)
+		{
+			if (vm_page_frozen)
+			{
+				Assert(vacrel->eager_scan_remaining_successes > 0);
+				vacrel->eager_scan_remaining_successes--;
+
+				if (vacrel->eager_scan_remaining_successes == 0)
+				{
+					/*
+					 * An aggressive vacuum is not considered to eagerly scan
+					 * pages, so it should never get here.
+					 */
+					Assert(!vacrel->aggressive);
+
+					/*
+					 * If we hit our success limit, there is no need to
+					 * eagerly scan any additional pages. Permanently disable
+					 * eager scanning by setting the other eager scan
+					 * management fields to their disabled values as well.
+					 */
+					vacrel->eager_scan_remaining_fails = 0;
+					vacrel->next_eager_scan_region_start = InvalidBlockNumber;
+
+					ereport(INFO,
+							(errmsg("Vacuum successfully froze %u eager scanned blocks of \"%s.%s.%s\". Now disabling eager scanning.",
+									orig_eager_scan_success_limit,
+									vacrel->dbname, vacrel->relnamespace,
+									vacrel->relname)));
+				}
+			}
+			else
+			{
+				Assert(vacrel->eager_scan_remaining_fails > 0);
+				vacrel->eager_scan_remaining_fails--;
+			}
+		}
 
 		/*
 		 * Now drop the buffer lock and, potentially, update the FSM.
@@ -1149,7 +1425,9 @@ lazy_scan_heap(LVRelState *vacrel)
  *
  * The block number and visibility status of the next block to process are set
  * in *blkno and *all_visible_according_to_vm.  The return value is false if
- * there are no further blocks to process.
+ * there are no further blocks to process. If the block is being eagerly
+ * scanned, was_eager_scanned is set so that the caller can count whether or
+ * not an eager scanned page is successfully frozen.
  *
  * vacrel is an in/out parameter here.  Vacuum options and information about
  * the relation are read.  vacrel->skippedallvis is set if we skip a block
@@ -1159,13 +1437,16 @@ lazy_scan_heap(LVRelState *vacrel)
  */
 static bool
 heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
-						 bool *all_visible_according_to_vm)
+						 bool *all_visible_according_to_vm,
+						 bool *was_eager_scanned)
 {
 	BlockNumber next_block;
 
 	/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
 	next_block = vacrel->current_block + 1;
 
+	*was_eager_scanned = false;
+
 	/* Have we reached the end of the relation? */
 	if (next_block >= vacrel->rel_pages)
 	{
@@ -1238,6 +1519,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
 
 		*blkno = vacrel->current_block = next_block;
 		*all_visible_according_to_vm = vacrel->next_unskippable_allvis;
+		*was_eager_scanned = vacrel->next_unskippable_eager_scanned;
+		if (*was_eager_scanned)
+			vacrel->eager_scanned_pages++;
 		return true;
 	}
 }
@@ -1261,11 +1545,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 	BlockNumber rel_pages = vacrel->rel_pages;
 	BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
 	Buffer		next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
+	bool		next_unskippable_eager_scanned = false;
 	bool		next_unskippable_allvis;
 
 	*skipsallvis = false;
 
-	for (;;)
+	for (;; next_unskippable_block++)
 	{
 		uint8		mapbits = visibilitymap_get_status(vacrel->rel,
 													   next_unskippable_block,
@@ -1273,6 +1558,17 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 
 		next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
 
+		/*
+		 * At the start of each eager scan region, normal vacuums with eager
+		 * scanning enabled reset the failure counter, allowing them to resume
+		 * eager scanning if it had been suspended in the previous region.
+		 */
+		if (next_unskippable_block >= vacrel->next_eager_scan_region_start)
+		{
+			vacrel->eager_scan_remaining_fails = EAGER_SCAN_MAX_FAILS_PER_REGION;
+			vacrel->next_eager_scan_region_start += EAGER_SCAN_REGION_SIZE;
+		}
+
 		/*
 		 * A block is unskippable if it is not all visible according to the
 		 * visibility map.
@@ -1305,24 +1601,34 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
 		 * all-visible.  They may still skip all-frozen pages, which can't
 		 * contain XIDs < OldestXmin (XIDs that aren't already frozen by now).
 		 */
-		if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
-		{
-			if (vacrel->aggressive)
-				break;
+		if (mapbits & VISIBILITYMAP_ALL_FROZEN)
+			continue;
 
-			/*
-			 * All-visible block is safe to skip in non-aggressive case.  But
-			 * remember that the final range contains such a block for later.
-			 */
-			*skipsallvis = true;
+		/*
+		 * Aggressive vacuums cannot skip all-visible pages that are not also
+		 * all-frozen. Normal vacuums with eager scanning enabled only skip
+		 * such pages if they have hit the failure limit for the current eager
+		 * scan region.
+		 */
+		if (vacrel->aggressive ||
+			vacrel->eager_scan_remaining_fails > 0)
+		{
+			if (!vacrel->aggressive)
+				next_unskippable_eager_scanned = true;
+			break;
 		}
 
-		next_unskippable_block++;
+		/*
+		 * All-visible blocks are safe to skip in a normal vacuum. But
+		 * remember that the final range contains such a block for later.
+		 */
+		*skipsallvis = true;
 	}
 
 	/* write the local variables back to vacrel */
 	vacrel->next_unskippable_block = next_unskippable_block;
 	vacrel->next_unskippable_allvis = next_unskippable_allvis;
+	vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
 	vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
 }
 
@@ -1353,6 +1659,10 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
  * lazy_scan_prune (or lazy_scan_noprune).  Otherwise returns true, indicating
  * that lazy_scan_heap is done processing the page, releasing lock on caller's
  * behalf.
+ *
+ * No vm_page_frozen output parameter (like what is passed to
+ * lazy_scan_prune()) is passed here because empty pages are always frozen and
+ * thus could never be eager scanned.
  */
 static bool
 lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
@@ -1492,6 +1802,10 @@ cmpOffsetNumbers(const void *a, const void *b)
  *
  * *has_lpdead_items is set to true or false depending on whether, upon return
  * from this function, any LP_DEAD items are still present on the page.
+ *
+ * *vm_page_frozen is set to true if the page is set all-frozen in the VM. The
+ * caller currently only uses this for determining whether an eagerly scanned
+ * page was successfully set all-frozen.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1500,7 +1814,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				bool *has_lpdead_items)
+				bool *has_lpdead_items,
+				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
 	PruneFreezeResult presult;
@@ -1652,11 +1967,17 @@ lazy_scan_prune(LVRelState *vacrel,
 		{
 			vacrel->vm_new_visible_pages++;
 			if (presult.all_frozen)
+			{
 				vacrel->vm_new_visible_frozen_pages++;
+				*vm_page_frozen = true;
+			}
 		}
 		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
 				 presult.all_frozen)
+		{
 			vacrel->vm_new_frozen_pages++;
+			*vm_page_frozen = true;
+		}
 	}
 
 	/*
@@ -1744,6 +2065,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		{
 			vacrel->vm_new_visible_pages++;
 			vacrel->vm_new_visible_frozen_pages++;
+			*vm_page_frozen = true;
 		}
 
 		/*
@@ -1751,7 +2073,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * above, so we don't need to test the value of old_vmbits.
 		 */
 		else
+		{
 			vacrel->vm_new_frozen_pages++;
+			*vm_page_frozen = true;
+		}
 	}
 }
 
-- 
2.34.1

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Reply via email to