On Sat, Jan 6, 2018 at 7:33 PM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings Claudio,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
> > On Mon, Nov 27, 2017 at 2:39 PM, Jing Wang <jingwang...@gmail.com>
> wrote:
> > > A few general comments.
> > >
> > > +   FreeSpaceMapVacuum(onerel, 64);
> > >
> > > Just want to know why '64' is used here? It's better to give a
> description.
> > >
> > > +    else
> > > +   {
> > > +       newslot = fsm_get_avail(page, 0);
> > > +   }
> > >
> > > Since there is only one line in the else the bracket will not be
> needed. And
> > > there in one more space ahead of else which should be removed.
> > >
> > >
> > > +       if (nindexes == 0 &&
> > > +           (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> > > vacuum_fsm_every_pages)
> > > +       {
> > > +           /* Vacuum the Free Space Map */
> > > +           FreeSpaceMapVacuum(onerel, 0);
> > > +           vacuumed_pages_at_fsm_vac = vacuumed_pages;
> > > +       }
> > >
> > > vacuumed_pages_at_fsm_vac is initialised with value zero and seems no
> chance
> > > to go into the bracket.
> >
> > The patch presented still applies, and there has been a review two
> > days ago. This is a short delay to answer for the author, so I am
> > moving this patch to next CF with the same status of waiting on
> > author.
>
> Looks like this patch probably still applies and the changes suggested
> above seem pretty small, any chance you can provide an updated patch
> soon Claudio, so that it can be reviewed properly during this
> commitfest?
>

I failed to notice the comments among the list's noise, sorry.

I'll get on to it now.

On Mon, Nov 27, 2017 at 2:39 AM, Jing Wang <jingwang...@gmail.com> wrote:

> A few general comments.
>
> +   FreeSpaceMapVacuum(onerel, 64);
>
> Just want to know why '64' is used here? It's better to give a
> description.
>

It's just a random cutoff point.

The point of that vacuum run is to mark free space early, to avoid needless
relation extension before long vacuum runs finish. This only happens if the
FSM is mostly zeroes, if there's free space in there, it will be used.

To make this run fast, since it's all redundant work before the final FSM
vacuum pass, branches with more than that amount of free space are skipped.
There's little point in vacuuming those early since they already contain
free space. This helps avoid the quadratic cost of vacuuming the FSM every
N dead tuples, since already-vacuumed branches will have free space, and
will thus be skipped. It could still be quadratic in the worst case, but it
avoids it often enough.

As for the value itself, it's an arbitrary choice. In-between index passes,
we have a rather precise cutoff we can use to avoid this redundant work.
But in the first run, we don't, so I made an arbitrary choice there that
felt right. I have no empirical performance data about alternative values.


>
> +    else
> +   {
> +       newslot = fsm_get_avail(page, 0);
> +   }
>
> Since there is only one line in the else the bracket will not be needed.
> And there in one more space ahead of else which should be removed.
>

Ok


>
> +       if (nindexes == 0 &&
> +           (vacuumed_pages_at_fsm_vac - vacuumed_pages) >
> vacuum_fsm_every_pages)
> +       {
> +           /* Vacuum the Free Space Map */
> +           FreeSpaceMapVacuum(onerel, 0);
> +           vacuumed_pages_at_fsm_vac = vacuumed_pages;
> +       }
>
> vacuumed_pages_at_fsm_vac is initialised with value zero and seems no
> chance to go into the bracket.
>

You're right, the difference there is backwards

Attached patch with the proposed changes.
From c1f9a70172d2284508eb877f0f2e647237b43095 Mon Sep 17 00:00:00 2001
From: Claudio Freire <klaussfre...@gmail.com>
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c            |  2 +-
 src/backend/access/brin/brin_pageops.c    | 10 ++---
 src/backend/commands/vacuumlazy.c         | 74 +++++++++++++++++++++++++++++--
 src/backend/storage/freespace/freespace.c | 29 +++++++++---
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h           |  2 +-
 6 files changed, 102 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index f54968b..694be5d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1461,5 +1461,5 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 	 * the way to the top.
 	 */
 	if (vacuum_fsm)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 }
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 60a7025..d4c7a87 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -136,7 +136,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -156,7 +156,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 				brin_initialize_empty_new_buffer(idxrel, newbuf);
 			UnlockReleaseBuffer(newbuf);
 			if (extended)
-				FreeSpaceMapVacuum(idxrel);
+				FreeSpaceMapVacuum(idxrel, 0);
 		}
 		return false;
 	}
@@ -211,7 +211,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
 		if (extended)
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 
 		return true;
 	}
@@ -313,7 +313,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		{
 			Assert(BlockNumberIsValid(newblk));
 			RecordPageWithFreeSpace(idxrel, newblk, freespace);
-			FreeSpaceMapVacuum(idxrel);
+			FreeSpaceMapVacuum(idxrel, 0);
 		}
 
 		return true;
@@ -457,7 +457,7 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 	LockBuffer(revmapbuf, BUFFER_LOCK_UNLOCK);
 
 	if (extended)
-		FreeSpaceMapVacuum(idxrel);
+		FreeSpaceMapVacuum(idxrel, 0);
 
 	return off;
 }
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index cf7f5e1..2965204 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -103,6 +103,19 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/*
+ * If autovacuums get regularly cancelled, the FSM may never be
+ * vacuumed. To work around that, we perform an initial partial
+ * FSM vacuum at the beginning of the vacuum process, to quickly
+ * make existing unmarked free space visible. To avoid useless
+ * redundant work, however, we avoid recursing into branches
+ * that already have a set amount of free space, we only try
+ * to discover unmarked free space. This value controls how
+ * much free space is enough free space in that context. The
+ * value is in the terms of the FSM map.
+ */
+#define INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD	64
+
 typedef struct LVRelStats
 {
 	/* hasindex = true means two-pass strategy; false means one-pass */
@@ -250,6 +263,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
+	/*
+	 * Vacuum the Free Space Map partially before we start.
+	 * If an earlier vacuum was canceled, and that's likely in
+	 * highly contended tables, we may need to finish up. If we do
+	 * it now, we make the space visible to other backends regardless
+	 * of whether we succeed in finishing this time around.
+	 * Don't bother checking branches that already have usable space,
+	 * though.
+	 */
+	FreeSpaceMapVacuum(onerel, INITIAL_PARTIAL_FSM_VACUUM_THRESHOLD);
+
 	/* Open all indexes of the relation */
 	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
 	vacrelstats->hasindex = (nindexes > 0);
@@ -287,7 +311,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 								 PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);
 
 	/* Vacuum the Free Space Map */
-	FreeSpaceMapVacuum(onerel);
+	FreeSpaceMapVacuum(onerel, 0);
 
 	/*
 	 * Update statistics in pg_class.
@@ -470,7 +494,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
-				vacuumed_pages;
+				vacuumed_pages,
+				vacuumed_pages_at_fsm_vac,
+				vacuum_fsm_every_pages;
 	double		num_tuples,
 				tups_vacuumed,
 				nkeep,
@@ -480,6 +506,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	Size		max_freespace = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -504,7 +531,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 
-	empty_pages = vacuumed_pages = 0;
+	empty_pages = vacuumed_pages = vacuumed_pages_at_fsm_vac = 0;
 	num_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
@@ -517,6 +544,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
+	/*
+	 * Vacuum the FSM a few times in the middle if the relation is big
+	 * and has no indexes. Once every 8GB of dirtied pages, or one 8th
+	 * of the relation, whatever is bigger, to avoid quadratic cost.
+	 * If it has indexes, this is ignored, and the FSM is vacuumed after
+	 * each index pass.
+	 */
+	vacuum_fsm_every_pages = nblocks / 8;
+	vacuum_fsm_every_pages = Max(vacuum_fsm_every_pages, 1048576);
+
 	lazy_space_alloc(vacrelstats, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
@@ -756,6 +793,14 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			vacrelstats->num_dead_tuples = 0;
 			vacrelstats->num_index_scans++;
 
+			/*
+			 * Vacuum the Free Space Map to make the changes we made visible.
+			 * Don't recurse into branches with more than max_freespace,
+			 * as we can't set it higher already.
+			 */
+			FreeSpaceMapVacuum(onerel, max_freespace);
+			max_freespace = 0;
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -878,6 +923,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			UnlockReleaseBuffer(buf);
 
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
 			continue;
 		}
 
@@ -917,6 +964,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
 			continue;
 		}
 
@@ -1272,7 +1321,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * taken if there are no indexes.)
 		 */
 		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		{
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
+			if (freespace > max_freespace)
+				max_freespace = freespace;
+		}
+
+		/*
+		 * If there are no indexes then we should periodically vacuum the FSM
+		 * on huge relations to make free space visible early.
+		 */
+		if (nindexes == 0 &&
+			(vacuumed_pages - vacuumed_pages_at_fsm_vac) > vacuum_fsm_every_pages)
+		{
+			/* Vacuum the Free Space Map */
+			FreeSpaceMapVacuum(onerel, max_freespace);
+			vacuumed_pages_at_fsm_vac = vacuumed_pages;
+			max_freespace = 0;
+		}
 	}
 
 	/* report that everything is scanned and vacuumed */
@@ -1922,6 +1988,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 		 * We don't insert a vacuum delay point here, because we have an
 		 * exclusive lock on the table which we want to hold for as short a
 		 * time as possible.  We still need to check for interrupts however.
+		 * We might have to acquire the autovacuum lock, however, but that
+		 * shouldn't pose a deadlock risk.
 		 */
 		CHECK_FOR_INTERRUPTS();
 
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index dd8ae52..9c1ccf1 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -108,7 +108,7 @@ static Size fsm_space_cat_to_avail(uint8 cat);
 static int fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 				   uint8 newValue, uint8 minValue);
 static BlockNumber fsm_search(Relation rel, uint8 min_cat);
-static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof);
+static uint8 fsm_vacuum_page(Relation rel, FSMAddress addr, Size threshold, bool *eof);
 static BlockNumber fsm_get_lastblckno(Relation rel, FSMAddress addr);
 static void fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat);
 
@@ -376,7 +376,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
  * FreeSpaceMapVacuum - scan and fix any inconsistencies in the FSM
  */
 void
-FreeSpaceMapVacuum(Relation rel)
+FreeSpaceMapVacuum(Relation rel, Size threshold)
 {
 	bool		dummy;
 
@@ -384,7 +384,7 @@ FreeSpaceMapVacuum(Relation rel)
 	 * Traverse the tree in depth-first order. The tree is stored physically
 	 * in depth-first order, so this should be pretty I/O efficient.
 	 */
-	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, &dummy);
+	fsm_vacuum_page(rel, FSM_ROOT_ADDRESS, threshold, &dummy);
 }
 
 /******** Internal routines ********/
@@ -663,6 +663,8 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
  * If minValue > 0, the updated page is also searched for a page with at
  * least minValue of free space. If one is found, its slot number is
  * returned, -1 otherwise.
+ *
+ * If minValue == 0, the value at the root node is returned.
  */
 static int
 fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
@@ -687,6 +689,8 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 								   addr.level == FSM_BOTTOM_LEVEL,
 								   true);
 	}
+	else
+		newslot = fsm_get_avail(page, 0);
 
 	UnlockReleaseBuffer(buf);
 
@@ -785,7 +789,7 @@ fsm_search(Relation rel, uint8 min_cat)
  * Recursive guts of FreeSpaceMapVacuum
  */
 static uint8
-fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
+fsm_vacuum_page(Relation rel, FSMAddress addr, Size threshold, bool *eof_p)
 {
 	Buffer		buf;
 	Page		page;
@@ -816,11 +820,19 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 		{
 			int			child_avail;
 
+			/* Tree pruning for partial vacuums */
+			if (threshold)
+			{
+				child_avail = fsm_get_avail(page, slot);
+				if (child_avail >= threshold)
+					continue;
+			}
+
 			CHECK_FOR_INTERRUPTS();
 
 			/* After we hit end-of-file, just clear the rest of the slots */
 			if (!eof)
-				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), &eof);
+				child_avail = fsm_vacuum_page(rel, fsm_get_child(addr, slot), threshold, &eof);
 			else
 				child_avail = 0;
 
@@ -884,6 +896,11 @@ fsm_update_recursive(Relation rel, FSMAddress addr, uint8 new_cat)
 	 * information in that.
 	 */
 	parent = fsm_get_parent(addr, &parentslot);
-	fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
+	new_cat = fsm_set_and_search(rel, parent, parentslot, new_cat, 0);
+
+	/*
+	 * Bubble up, not the value we just set, but the one now in the root
+	 * node of the just-updated page, which is the page's highest value.
+	 */
 	fsm_update_recursive(rel, parent, new_cat);
 }
diff --git a/src/backend/storage/freespace/indexfsm.c b/src/backend/storage/freespace/indexfsm.c
index e21047b..dd77a16 100644
--- a/src/backend/storage/freespace/indexfsm.c
+++ b/src/backend/storage/freespace/indexfsm.c
@@ -70,5 +70,5 @@ RecordUsedIndexPage(Relation rel, BlockNumber usedBlock)
 void
 IndexFreeSpaceMapVacuum(Relation rel)
 {
-	FreeSpaceMapVacuum(rel);
+	FreeSpaceMapVacuum(rel, 0);
 }
diff --git a/src/include/storage/freespace.h b/src/include/storage/freespace.h
index a517d7e..e8bdaa2 100644
--- a/src/include/storage/freespace.h
+++ b/src/include/storage/freespace.h
@@ -31,7 +31,7 @@ extern void XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 							Size spaceAvail);
 
 extern void FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks);
-extern void FreeSpaceMapVacuum(Relation rel);
+extern void FreeSpaceMapVacuum(Relation rel, Size threshold);
 extern void UpdateFreeSpaceMap(Relation rel,
 				   BlockNumber startBlkNum,
 				   BlockNumber endBlkNum,
-- 
1.8.4.5

Reply via email to