Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >If you highlight "738754560" in the output it appears to duplicate the
> >syscalls issued until it preads() - in case of "738754560" offset it was
> >asked for 3 times. Also I wouldn't  imagine in wildest dreams that
> >posix_fadvise(POSIX_FADV_WILLNEED) is such a cheap syscall.
> 
> IMHO that'a a bug in the patch, which always tries to prefetch all
> "future" blocks, including those that were already prefetched. It
> probably needs to do something like bitmap heap scan where we track
> what was already prefetched and only issue the new blocks.

Updated patch attached which:

- Starts out by pre-fetching the first effective_io_concurrency number
  of blocks we are going to want, hopefully making it so the kernel will
  trust our fadvise's over its own read-ahead, right from the start.
- Makes sure the prefetch iterator is pushed forward whenever the
  regular interator is moved forward.
- After each page read, issues a prefetch, similar to BitmapHeapScan, to
  hopefully avoiding having the prefetching get in the way of the
  regular i/o.
- Added some comments, ran pgindent, added a commit message.

I do think we should also include patch that Jakub wrote previously
which adds information about the read rate of ANALYZE.

I'll look at integrating that into this patch and then look at a new
patch to do something similar for VACUUM in a bit.

If you're doing further benchmarking of ANALYZE though, this would
probably be the better patch to use.  Certainly improved performance
here quite a bit with effective_io_concurrency set to 16.

Thanks,

Stephen
From b154e180718b4b57433093719a43ee6310d1da6f Mon Sep 17 00:00:00 2001
From: Stephen Frost <sfr...@snowman.net>
Date: Wed, 4 Nov 2020 10:46:23 -0500
Subject: [PATCH] Use pre-fetching for ANALYZE

When we have posix_fadvise() available, we can improve the performance
of an ANALYZE by quite a bit by using it to inform the kernel of the
blocks that we're going to be asking for.

Discussion: https://www.postgresql.org/message-id/VI1PR0701MB69603A433348EDCF783C6ECBF6EF0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
---
 src/backend/commands/analyze.c | 60 +++++++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..bee3453fe4 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1031,6 +1031,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 	double		liverows = 0;	/* # live rows seen */
 	double		deadrows = 0;	/* # dead rows seen */
 	double		rowstoskip = -1;	/* -1 means not set yet */
+	long		randseed;		/* Seed for block sampler(s) */
 	BlockNumber totalblocks;
 	TransactionId OldestXmin;
 	BlockSamplerData bs;
@@ -1039,6 +1040,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+#ifdef USE_PREFETCH
+	BlockSamplerData prefetch_bs;
+#endif
 
 	Assert(targrows > 0);
 
@@ -1048,7 +1052,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestNonRemovableTransactionId(onerel);
 
 	/* Prepare for sampling block numbers */
-	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+	randseed = random();
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
+
+#ifdef USE_PREFETCH
+	/* Create another BlockSampler, using the same seed, for prefetching */
+	(void) BlockSampler_Init(&prefetch_bs, totalblocks, targrows, randseed);
+#endif
 
 	/* Report sampling block numbers */
 	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
@@ -1060,16 +1070,64 @@ acquire_sample_rows(Relation onerel, int elevel,
 	scan = table_beginscan_analyze(onerel);
 	slot = table_slot_create(onerel, NULL);
 
+#ifdef USE_PREFETCH
+
+	/*
+	 * If we are doing prefetching, then go ahead and tell the kernel about
+	 * the first set of pages we are going to want.  This also moves our
+	 * iterator out ahead of the main one being used, where we will keep it so
+	 * that we're always pre-fetching out effective_io_concurrency number of
+	 * blocks ahead.
+	 */
+	if (effective_io_concurrency)
+	{
+		for (int i = 0; i < effective_io_concurrency; i++)
+		{
+			BlockNumber prefetch_block;
+
+			if (!BlockSampler_HasMore(&prefetch_bs))
+				break;
+
+			prefetch_block = BlockSampler_Next(&prefetch_bs);
+			PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_block);
+		}
+	}
+#endif
+
 	/* Outer loop over blocks to sample */
 	while (BlockSampler_HasMore(&bs))
 	{
+#ifdef USE_PREFETCH
+		BlockNumber prefetch_targblock = InvalidBlockNumber;
+#endif
 		BlockNumber targblock = BlockSampler_Next(&bs);
 
+#ifdef USE_PREFETCH
+
+		/*
+		 * Make sure that every time the main BlockSampler is moved forward
+		 * that our prefetch BlockSampler also gets moved forward, so that we
+		 * always stay out ahead.
+		 */
+		if (BlockSampler_HasMore(&prefetch_bs))
+			prefetch_targblock = BlockSampler_Next(&prefetch_bs);
+#endif
+
 		vacuum_delay_point();
 
 		if (!table_scan_analyze_next_block(scan, targblock, vac_strategy))
 			continue;
 
+#ifdef USE_PREFETCH
+
+		/*
+		 * When pre-fetching, after we get a block, tell the kernel about the
+		 * next one we will want, if there's any left.
+		 */
+		if (effective_io_concurrency && prefetch_targblock != InvalidBlockNumber)
+			PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock);
+#endif
+
 		while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
 		{
 			/*
-- 
2.25.1

Attachment: signature.asc
Description: PGP signature

Reply via email to