Hi,

On 2025-03-18 08:52, Jelte Fennema-Nio wrote:
On Mon, 10 Feb 2025 at 14:23, torikoshia <torikos...@oss.nttdata.com> wrote:
Thanks for reviewing the patch and comments!
Fixed issues you pointed out and attached v2 patch.

This patch needs a rebase, because it's failing to compile currently.
So I marked this as "Waiting on Author" in the commitfest app.

Thanks! I've attached an updated patch.


BTW based on your discussion, I thought this patch could not be merged anytime soon. Does that align with your understanding?

- With bgworker-based AIO, this patch could mislead users into underestimating the actual storage I/O load, which is undesirable. - With io_uring-based AIO, this patch could provide meaningful values, but it may take some time before io_uring sees widespread adoption.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
From 5c889804ebc145173e7c9cf2519c7a91ede05bcf Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Wed, 19 Mar 2025 22:07:19 +0900
Subject: [PATCH v3] Add storage I/O tracking to 'BUFFERS' option

The 'BUFFERS' option currently indicates whether a block hit the shared
buffer, but does not distinguish between a cache hit in the OS cache or
a storage I/O operation.
While shared buffers and OS cache offer similar performance, storage
I/O is significantly slower in comparison.  By measuring the numbers of
 storage I/O read and write, we can better identify if storage I/O is a
 bottleneck in performance.

Added tracking of storage I/O usage by calling getrusage(2) at both the
planning and execution phase start and end points.

A more granular approach as well as current BUFFERS option(tracking at
each plan node) was considered but found to be impractical due to the
high performance cost of frequent getrusage() calls.

TODO:
I believe this information is mainly useful when used in auto_explain.
I'll implement it later.
---
 doc/src/sgml/ref/explain.sgml         |  25 ++++--
 src/backend/access/brin/brin.c        |   8 +-
 src/backend/access/gin/gininsert.c    |   8 +-
 src/backend/access/nbtree/nbtsort.c   |   8 +-
 src/backend/commands/explain.c        | 110 +++++++++++++++++++++++++-
 src/backend/commands/prepare.c        |   8 ++
 src/backend/commands/vacuumparallel.c |   8 +-
 src/backend/executor/execParallel.c   |  35 ++++++--
 src/backend/executor/instrument.c     |  62 ++++++++++++++-
 src/include/commands/explain.h        |   1 +
 src/include/executor/execParallel.h   |   2 +
 src/include/executor/instrument.h     |  19 ++++-
 src/include/port/win32/sys/resource.h |   2 +
 src/port/win32getrusage.c             |   4 +
 src/test/regress/expected/explain.out |  37 ++++++++-
 src/tools/pgindent/typedefs.list      |   1 +
 16 files changed, 296 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 9ed1061b7f..5a66b10afe 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -198,13 +198,24 @@ ROLLBACK;
       previously unmodified blocks that were changed by this query; while the
       number of blocks <emphasis>written</emphasis> indicates the number of
       previously-dirtied blocks evicted from cache by this backend during
-      query processing.
-      The number of blocks shown for an
-      upper-level node includes those used by all its child nodes.  In text
-      format, only non-zero values are printed.  Buffers information is
-      included by default when <literal>ANALYZE</literal> is used but
-      otherwise is not included by default, but can be enabled using this
-      option.
+      query processing. In text format, only non-zero values are printed.
+      If possible, this option also displays the number of read and write
+      operations performed on storage during the planning and execution phases,
+      shown at the end of the plan. These values are obtained from the
+      <function>getrusage()</function> system call. Note that on platforms that
+      do not support <function>getrusage()</function>, such as Windows, no output
+      will be shown, even if reads or writes actually occur. Additionally, even
+      on platforms where <function>getrusage()</function> is supported, if the
+      kernel is built without the necessary options to track storage read and
+      write operations, no output will be shown.
+      The timing and unit of measurement for read and write operations may vary
+      depending on the platform. For example, on Linux, a read is counted only
+      if this process caused data to be fetched from the storage layer, and a
+      write is counted at the page-dirtying time. On Linux, the unit of
+      measurement for read and write operations is 512 KB.
+      Buffers information is included by default when <literal>ANALYZE</literal>
+      is used but otherwise is not included by default, but can be enabled using
+      this option.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 737ad63880..fdabcf506b 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2558,7 +2558,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
+		InstrAccumParallelQuery(&brinleader->bufferusage[i], NULL, &brinleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(brinleader->snapshot))
@@ -2920,7 +2920,7 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	tuplesort_attach_shared(sharedsort, seg);
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/*
 	 * Might as well use reliable figure when doling out maintenance_work_mem
@@ -2935,8 +2935,8 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 	index_close(indexRel, indexLockmode);
 	table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index b2f89cad88..d768f2dc17 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -1084,7 +1084,7 @@ _gin_end_parallel(GinLeader *ginleader, GinBuildState *state)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < ginleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&ginleader->bufferusage[i], &ginleader->walusage[i]);
+		InstrAccumParallelQuery(&ginleader->bufferusage[i], NULL, &ginleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(ginleader->snapshot))
@@ -2135,7 +2135,7 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	tuplesort_attach_shared(sharedsort, seg);
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/*
 	 * Might as well use reliable figure when doling out maintenance_work_mem
@@ -2150,8 +2150,8 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 	index_close(indexRel, indexLockmode);
 	table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3794cc924a..c66e742909 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1618,7 +1618,7 @@ _bt_end_parallel(BTLeader *btleader)
 	 * or we might get incomplete data.)
 	 */
 	for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
-		InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
+		InstrAccumParallelQuery(&btleader->bufferusage[i], NULL, &btleader->walusage[i]);
 
 	/* Free last reference to MVCC snapshot, if one was used */
 	if (IsMVCCSnapshot(btleader->snapshot))
@@ -1826,7 +1826,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	}
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/* Perform sorting of spool, and possibly a spool2 */
 	sortmem = maintenance_work_mem / btshared->scantuplesortstates;
@@ -1836,8 +1836,8 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
 	/* Report WAL/buffer usage during parallel execution */
 	bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
 	walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
-						  &walusage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+						  &walusage[ParallelWorkerNumber], NULL);
 
 #ifdef BTREE_BUILD_STATS
 	if (log_btree_build_stats)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 33a16d2d8e..d16833f31e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -144,6 +144,8 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static bool peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
+static void show_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
 static void show_wal_usage(ExplainState *es, const WalUsage *usage);
 static void show_memory_counters(ExplainState *es,
 								 const MemoryContextCounters *mem_counters);
@@ -325,6 +327,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 				planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	StorageIOUsage storageio,
+				storageio_start;
 	MemoryContextCounters mem_counters;
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
@@ -346,7 +350,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	}
 
 	if (es->buffers)
+	{
 		bufusage_start = pgBufferUsage;
+		GetStorageIOUsage(&storageio_start);
+	}
 	INSTR_TIME_SET_CURRENT(planstart);
 
 	/* plan the query */
@@ -366,12 +373,17 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
 	{
 		memset(&bufusage, 0, sizeof(BufferUsage));
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+
+		GetStorageIOUsage(&storageio);
+		storageio.inblock -= storageio_start.inblock;
+		storageio.outblock -= storageio_start.outblock;
 	}
 
 	/* run it (if needed) and produce output */
 	ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
 				   queryEnv,
 				   &planduration, (es->buffers ? &bufusage : NULL),
+				   (es->buffers ? &storageio : NULL),
 				   es->memory ? &mem_counters : NULL);
 }
 
@@ -497,7 +509,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 			   IntoClause *into, ExplainState *es,
 			   const char *queryString, ParamListInfo params,
 			   QueryEnvironment *queryEnv, const instr_time *planduration,
-			   const BufferUsage *bufusage,
+			   const BufferUsage *bufusage, const StorageIOUsage *planstorageio,
 			   const MemoryContextCounters *mem_counters)
 {
 	DestReceiver *dest;
@@ -507,6 +519,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 	int			eflags;
 	int			instrument_option = 0;
 	SerializeMetrics serializeMetrics = {0};
+	StorageIOUsage storageio_start;
 
 	Assert(plannedstmt->commandType != CMD_UTILITY);
 
@@ -516,7 +529,19 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 		instrument_option |= INSTRUMENT_ROWS;
 
 	if (es->buffers)
+	{
+		GetStorageIOUsage(&storageio_start);
+
+		/*
+		 * Initialize global variable counters for parallel query workers.
+		 * Even if the query is cancelled on the way, the EXPLAIN execution
+		 * always passes here, so it can be initialized here.
+		 */
+		pgStorageIOUsageParallel.inblock = 0;
+		pgStorageIOUsageParallel.outblock = 0;
+
 		instrument_option |= INSTRUMENT_BUFFERS;
+	}
 	if (es->wal)
 		instrument_option |= INSTRUMENT_WAL;
 
@@ -609,8 +634,9 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
-	/* Show buffer and/or memory usage in planning */
-	if (peek_buffer_usage(es, bufusage) || mem_counters)
+	/* Show buffer, storage I/O, and/or memory usage in planning */
+	if (peek_buffer_usage(es, bufusage) || peek_storageio_usage(es, planstorageio) ||
+		mem_counters)
 	{
 		ExplainOpenGroup("Planning", "Planning", true, es);
 
@@ -622,8 +648,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 		}
 
 		if (bufusage)
+		{
 			show_buffer_usage(es, bufusage);
-
+			show_storageio_usage(es, planstorageio);
+		}
 		if (mem_counters)
 			show_memory_counters(es, mem_counters);
 
@@ -680,6 +708,35 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 
 	totaltime += elapsed_time(&starttime);
 
+	/* Show storage I/O usage in execution */
+	if (es->buffers)
+	{
+		StorageIOUsage storageio = {0};
+		StorageIOUsage storageio_end;
+
+		GetStorageIOUsage(&storageio_end);
+		StorageIOUsageAccumDiff(&storageio, &storageio_end, &storageio_start);
+		StorageIOUsageAdd(&storageio, &pgStorageIOUsageParallel);
+
+		if (peek_storageio_usage(es, &storageio))
+		{
+			ExplainOpenGroup("Execution", "Execution", true, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfoString(es->str, "Execution:\n");
+				es->indent++;
+			}
+			show_storageio_usage(es, &storageio);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+				es->indent--;
+
+			ExplainCloseGroup("Execution", "Execution", true, es);
+		}
+	}
+
 	/*
 	 * We only report execution time if we actually ran the query (that is,
 	 * the user specified ANALYZE), and if summary reporting is enabled (the
@@ -4270,6 +4327,51 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 	}
 }
 
+/*
+ * Return whether show_storageio_usage would have anything to print, if given
+ * the same 'usage' data.  Note that when the format is anything other than
+ * text, we print even if the counters are all zeroes.
+ */
+static bool
+peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+	if (usage == NULL)
+		return false;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+		return true;
+
+	return usage->inblock > 0 || usage->outblock > 0;
+}
+
+/*
+ * Show storage I/O usage.
+ */
+static void
+show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+	{
+		/* Show only positive counter values. */
+		if (usage->inblock <= 0 && usage->outblock <= 0)
+			return;
+
+		ExplainIndentText(es);
+		appendStringInfoString(es->str, "Storage I/O:");
+		appendStringInfo(es->str, " read=%ld times", (long) usage->inblock);
+		appendStringInfo(es->str, " write=%ld times", (long) usage->outblock);
+
+		appendStringInfoChar(es->str, '\n');
+	}
+	else
+	{
+		ExplainPropertyInteger("Storage I/O Read", NULL,
+							   usage->inblock, es);
+		ExplainPropertyInteger("Storage I/O Read", NULL,
+							   usage->outblock, es);
+	}
+}
+
 /*
  * Show WAL usage details.
  */
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index bf7d2b2309..d1e73a0793 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,6 +583,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	instr_time	planduration;
 	BufferUsage bufusage_start,
 				bufusage;
+	StorageIOUsage storageio,
+				storageio_start;
 	MemoryContextCounters mem_counters;
 	MemoryContext planner_ctx = NULL;
 	MemoryContext saved_ctx = NULL;
@@ -599,7 +601,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	}
 
 	if (es->buffers)
+	{
 		bufusage_start = pgBufferUsage;
+		GetStorageIOUsage(&storageio_start);
+	}
+
 	INSTR_TIME_SET_CURRENT(planstart);
 
 	/* Look it up in the hash table */
@@ -649,6 +655,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	{
 		memset(&bufusage, 0, sizeof(BufferUsage));
 		BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+		GetStorageIOUsage(&storageio);
 	}
 
 	plan_list = cplan->stmt_list;
@@ -662,6 +669,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 			ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
 						   into, es, query_string, paramLI, pstate->p_queryEnv,
 						   &planduration, (es->buffers ? &bufusage : NULL),
+						   (es->buffers ? &storageio : NULL),
 						   es->memory ? &mem_counters : NULL);
 		else
 			ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 2b9d548cde..f77124f8c5 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -737,7 +737,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
 		WaitForParallelWorkersToFinish(pvs->pcxt);
 
 		for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
-			InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+			InstrAccumParallelQuery(&pvs->buffer_usage[i], NULL, &pvs->wal_usage[i]);
 	}
 
 	/*
@@ -1083,7 +1083,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	error_context_stack = &errcallback;
 
 	/* Prepare to track buffer usage during parallel execution */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(NULL);
 
 	/* Process indexes to perform vacuum/cleanup */
 	parallel_vacuum_process_safe_indexes(&pvs);
@@ -1091,8 +1091,8 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	/* Report buffer/WAL usage during parallel execution */
 	buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
 	wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
-	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-						  &wal_usage[ParallelWorkerNumber]);
+	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], NULL,
+						  &wal_usage[ParallelWorkerNumber], NULL);
 
 	/* Report any remaining cost-based vacuum delay time */
 	if (track_cost_delay_timing)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index e9337a97d1..1728e1cdd0 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -65,6 +65,7 @@
 #define PARALLEL_KEY_QUERY_TEXT		UINT64CONST(0xE000000000000008)
 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
 #define PARALLEL_KEY_WAL_USAGE			UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_STORAGEIO_USAGE	UINT64CONST(0xE00000000000000B)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
 
@@ -608,6 +609,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	char	   *pstmt_space;
 	char	   *paramlistinfo_space;
 	BufferUsage *bufusage_space;
+	StorageIOUsage *storageiousage_space;
 	WalUsage   *walusage_space;
 	SharedExecutorInstrumentation *instrumentation = NULL;
 	SharedJitInstrumentation *jit_instrumentation = NULL;
@@ -689,6 +691,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 						   mul_size(sizeof(WalUsage), pcxt->nworkers));
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
+	/*
+	 * Same thing for StorageIOUsage.
+	 */
+	shm_toc_estimate_chunk(&pcxt->estimator,
+						   mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Estimate space for tuple queues. */
 	shm_toc_estimate_chunk(&pcxt->estimator,
 						   mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -784,6 +793,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
 	pei->wal_usage = walusage_space;
 
+	/* Same for StorageIOUsage. */
+	storageiousage_space = shm_toc_allocate(pcxt->toc,
+											mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_STORAGEIO_USAGE, storageiousage_space);
+	pei->storageio_usage = storageiousage_space;
+
 	/* Set up the tuple queues that the workers will write into. */
 	pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
 
@@ -1189,11 +1204,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
 	WaitForParallelWorkersToFinish(pei->pcxt);
 
 	/*
-	 * Next, accumulate buffer/WAL usage.  (This must wait for the workers to
-	 * finish, or we might get incomplete data.)
+	 * Next, accumulate buffer, WAL, and Storage I/O usage. (This must wait
+	 * for the workers to finish, or we might get incomplete data.)
 	 */
 	for (i = 0; i < nworkers; i++)
-		InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
+		InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->storageio_usage[i], &pei->wal_usage[i]);
 
 	pei->finished = true;
 }
@@ -1436,6 +1451,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 {
 	FixedParallelExecutorState *fpes;
 	BufferUsage *buffer_usage;
+	StorageIOUsage *storageio_usage;
+	StorageIOUsage storageio_usage_start = {0};
 	WalUsage   *wal_usage;
 	DestReceiver *receiver;
 	QueryDesc  *queryDesc;
@@ -1490,13 +1507,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
 
 	/*
-	 * Prepare to track buffer/WAL usage during query execution.
+	 * Prepare to track buffer, WAL, and StorageI/O usage during query
+	 * execution.
 	 *
 	 * We do this after starting up the executor to match what happens in the
 	 * leader, which also doesn't count buffer accesses and WAL activity that
 	 * occur during executor startup.
 	 */
-	InstrStartParallelQuery();
+	InstrStartParallelQuery(&storageio_usage_start);
 
 	/*
 	 * Run the plan.  If we specified a tuple bound, be careful not to demand
@@ -1509,11 +1527,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 	/* Shut down the executor */
 	ExecutorFinish(queryDesc);
 
-	/* Report buffer/WAL usage during parallel execution. */
+	/* Report buffer, WAL, and storageIO usage during parallel execution. */
 	buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+	storageio_usage = shm_toc_lookup(toc, PARALLEL_KEY_STORAGEIO_USAGE, false);
 	wal_usage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
 	InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
-						  &wal_usage[ParallelWorkerNumber]);
+						  &storageio_usage[ParallelWorkerNumber],
+						  &wal_usage[ParallelWorkerNumber],
+						  &storageio_usage_start);
 
 	/* Report instrumentation data if any instrumentation options are set. */
 	if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f470..5fa33b97da 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -13,16 +13,21 @@
  */
 #include "postgres.h"
 
+#include <sys/resource.h>
 #include <unistd.h>
 
 #include "executor/instrument.h"
 
 BufferUsage pgBufferUsage;
 static BufferUsage save_pgBufferUsage;
+
+/* Only count parallel workers' usage */
+StorageIOUsage pgStorageIOUsageParallel;
 WalUsage	pgWalUsage;
 static WalUsage save_pgWalUsage;
 
 static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+void		StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
 static void WalUsageAdd(WalUsage *dst, WalUsage *add);
 
 
@@ -197,27 +202,46 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
 
 /* note current values during parallel executor startup */
 void
-InstrStartParallelQuery(void)
+InstrStartParallelQuery(StorageIOUsage *storageiousage)
 {
 	save_pgBufferUsage = pgBufferUsage;
 	save_pgWalUsage = pgWalUsage;
+
+	if (storageiousage != NULL)
+		GetStorageIOUsage(storageiousage);
 }
 
 /* report usage after parallel executor shutdown */
 void
-InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start)
 {
 	memset(bufusage, 0, sizeof(BufferUsage));
 	BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+
+	if (storageiousage != NULL && storageiousage_start != NULL)
+	{
+		struct StorageIOUsage storageiousage_end;
+
+		GetStorageIOUsage(&storageiousage_end);
+
+		memset(storageiousage, 0, sizeof(StorageIOUsage));
+		StorageIOUsageAccumDiff(storageiousage, &storageiousage_end, storageiousage_start);
+
+		ereport(DEBUG1,
+				(errmsg("Parallel worker's storage I/O times: inblock:%ld outblock:%ld",
+						storageiousage->inblock, storageiousage->outblock)));
+	}
 	memset(walusage, 0, sizeof(WalUsage));
 	WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
 }
 
 /* accumulate work done by workers in leader's stats */
 void
-InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage)
 {
 	BufferUsageAdd(&pgBufferUsage, bufusage);
+	if (storageiousage != NULL)
+		StorageIOUsageAdd(&pgStorageIOUsageParallel, storageiousage);
 	WalUsageAdd(&pgWalUsage, walusage);
 }
 
@@ -273,6 +297,38 @@ BufferUsageAccumDiff(BufferUsage *dst,
 						  add->temp_blk_write_time, sub->temp_blk_write_time);
 }
 
+/* helper functions for StorageIOUsage usage accumulation */
+void
+StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add)
+{
+	dst->inblock += add->inblock;
+	dst->outblock += add->outblock;
+}
+
+/* dst += add - sub */
+void
+StorageIOUsageAccumDiff(StorageIOUsage *dst, const StorageIOUsage *add, const StorageIOUsage *sub)
+{
+	dst->inblock += add->inblock - sub->inblock;
+	dst->outblock += add->outblock - sub->outblock;
+}
+
+/* Captures the current storage I/O usage statistics */
+void
+GetStorageIOUsage(StorageIOUsage *usage)
+{
+	struct rusage rusage;
+
+	if (getrusage(RUSAGE_SELF, &rusage))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_SYSTEM_ERROR),
+				 errmsg("getrusage() failed: %m")));
+	}
+	usage->inblock = rusage.ru_inblock;
+	usage->outblock = rusage.ru_oublock;
+}
+
 /* helper functions for WAL usage accumulation */
 static void
 WalUsageAdd(WalUsage *dst, WalUsage *add)
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 387839eb5d..0becde3319 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -70,6 +70,7 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
 						   ParamListInfo params, QueryEnvironment *queryEnv,
 						   const instr_time *planduration,
 						   const BufferUsage *bufusage,
+						   const StorageIOUsage *planstorageio,
 						   const MemoryContextCounters *mem_counters);
 
 extern void ExplainPrintPlan(struct ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5e7106c397..5c8bc76c53 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,8 @@ typedef struct ParallelExecutorInfo
 	PlanState  *planstate;		/* plan subtree we're running in parallel */
 	ParallelContext *pcxt;		/* parallel context we're using */
 	BufferUsage *buffer_usage;	/* points to bufusage area in DSM */
+	StorageIOUsage *storageio_usage;	/* points to storageio usage area in
+										 * DSM */
 	WalUsage   *wal_usage;		/* walusage area in DSM */
 	SharedExecutorInstrumentation *instrumentation; /* optional */
 	struct SharedJitInstrumentation *jit_instrumentation;	/* optional */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6..e09c2f8943 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -41,6 +41,14 @@ typedef struct BufferUsage
 	instr_time	temp_blk_write_time;	/* time spent writing temp blocks */
 } BufferUsage;
 
+typedef struct StorageIOUsage
+{
+	long		inblock;		/* # of times the file system had to perform
+								 * input */
+	long		outblock;		/* # of times the file system had to perform
+								 * output */
+} StorageIOUsage;
+
 /*
  * WalUsage tracks only WAL activity like WAL records generation that
  * can be measured per query and is displayed by EXPLAIN command,
@@ -100,6 +108,7 @@ typedef struct WorkerInstrumentation
 } WorkerInstrumentation;
 
 extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT StorageIOUsage pgStorageIOUsageParallel;
 extern PGDLLIMPORT WalUsage pgWalUsage;
 
 extern Instrumentation *InstrAlloc(int n, int instrument_options,
@@ -110,11 +119,15 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
 extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
 extern void InstrEndLoop(Instrumentation *instr);
 extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
-extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
-extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrStartParallelQuery(StorageIOUsage *storageiousage);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage);
 extern void BufferUsageAccumDiff(BufferUsage *dst,
 								 const BufferUsage *add, const BufferUsage *sub);
+extern void StorageIOUsageAccumDiff(StorageIOUsage *dst,
+									const StorageIOUsage *add, const StorageIOUsage *sub);
+extern void StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
+extern void GetStorageIOUsage(StorageIOUsage *usage);
 extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
 							  const WalUsage *sub);
 
diff --git a/src/include/port/win32/sys/resource.h b/src/include/port/win32/sys/resource.h
index a14feeb584..270dc37c84 100644
--- a/src/include/port/win32/sys/resource.h
+++ b/src/include/port/win32/sys/resource.h
@@ -13,6 +13,8 @@ struct rusage
 {
 	struct timeval ru_utime;	/* user time used */
 	struct timeval ru_stime;	/* system time used */
+	long		ru_inblock;		/* Currently always 0 for Windows */
+	long		ru_oublock;		/* Currently always 0 for Windows */
 };
 
 extern int	getrusage(int who, struct rusage *rusage);
diff --git a/src/port/win32getrusage.c b/src/port/win32getrusage.c
index 6a197c9437..27f0ea052a 100644
--- a/src/port/win32getrusage.c
+++ b/src/port/win32getrusage.c
@@ -57,5 +57,9 @@ getrusage(int who, struct rusage *rusage)
 	rusage->ru_utime.tv_sec = li.QuadPart / 1000000L;
 	rusage->ru_utime.tv_usec = li.QuadPart % 1000000L;
 
+	/* Currently always 0 for Windows */
+	rusage->ru_inblock = 0;
+	rusage->ru_oublock = 0;
+
 	return 0;
 }
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index 340747a8f7..f635525ee5 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -127,10 +127,16 @@ select explain_filter('explain (analyze, buffers, format xml) select * from int8
        <Local-Written-Blocks>N</Local-Written-Blocks>  +
        <Temp-Read-Blocks>N</Temp-Read-Blocks>          +
        <Temp-Written-Blocks>N</Temp-Written-Blocks>    +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
      </Planning>                                       +
      <Planning-Time>N.N</Planning-Time>                +
      <Triggers>                                        +
      </Triggers>                                       +
+     <Execution>                                       +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+       <Storage-I-O-Read>N</Storage-I-O-Read>          +
+     </Execution>                                      +
      <Execution-Time>N.N</Execution-Time>              +
    </Query>                                            +
  </explain>
@@ -175,6 +181,8 @@ select explain_filter('explain (analyze, serialize, buffers, format yaml) select
      Local Written Blocks: N  +
      Temp Read Blocks: N      +
      Temp Written Blocks: N   +
+     Storage I/O Read: N      +
+     Storage I/O Read: N      +
    Planning Time: N.N         +
    Triggers:                  +
    Serialization:             +
@@ -191,6 +199,9 @@ select explain_filter('explain (analyze, serialize, buffers, format yaml) select
      Local Written Blocks: N  +
      Temp Read Blocks: N      +
      Temp Written Blocks: N   +
+   Execution:                 +
+     Storage I/O Read: N      +
+     Storage I/O Read: N      +
    Execution Time: N.N
 (1 row)
 
@@ -237,7 +248,13 @@ select explain_filter('explain (buffers, format json) select * from int8_tbl i8'
        "Local Dirtied Blocks": N,  +
        "Local Written Blocks": N,  +
        "Temp Read Blocks": N,      +
-       "Temp Written Blocks": N    +
+       "Temp Written Blocks": N,   +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
+     },                            +
+     "Execution": {                +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
      }                             +
    }                               +
  ]
@@ -335,11 +352,17 @@ select explain_filter('explain (analyze, buffers, format json) select * from int
        "Local I/O Read Time": N.N,  +
        "Local I/O Write Time": N.N, +
        "Temp I/O Read Time": N.N,   +
-       "Temp I/O Write Time": N.N   +
+       "Temp I/O Write Time": N.N,  +
+       "Storage I/O Read": N,       +
+       "Storage I/O Read": N        +
      },                             +
      "Planning Time": N.N,          +
      "Triggers": [                  +
      ],                             +
+     "Execution": {                 +
+       "Storage I/O Read": N,       +
+       "Storage I/O Read": N        +
+     },                             +
      "Execution Time": N.N          +
    }                                +
  ]
@@ -459,12 +482,18 @@ select explain_filter('explain (memory, analyze, format json) select * from int8
        "Local Written Blocks": N,  +
        "Temp Read Blocks": N,      +
        "Temp Written Blocks": N,   +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N,      +
        "Memory Used": N,           +
        "Memory Allocated": N       +
      },                            +
      "Planning Time": N.N,         +
      "Triggers": [                 +
      ],                            +
+     "Execution": {                +
+       "Storage I/O Read": N,      +
+       "Storage I/O Read": N       +
+     },                            +
      "Execution Time": N.N         +
    }                               +
  ]
@@ -677,6 +706,7 @@ select jsonb_pretty(
          },                                                 +
          "Planning": {                                      +
              "Local Hit Blocks": 0,                         +
+             "Storage I/O Read": 0,                         +
              "Temp Read Blocks": 0,                         +
              "Local Read Blocks": 0,                        +
              "Shared Hit Blocks": 0,                        +
@@ -689,6 +719,9 @@ select jsonb_pretty(
          },                                                 +
          "Triggers": [                                      +
          ],                                                 +
+         "Execution": {                                     +
+             "Storage I/O Read": 0                          +
+         },                                                 +
          "Planning Time": 0.0,                              +
          "Execution Time": 0.0                              +
      }                                                      +
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bfa276d2d3..b4afc49048 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2608,6 +2608,7 @@ SSL
 SSLExtensionInfoContext
 SSL_CTX
 STARTUPINFO
+StorageIOUsage
 STRLEN
 SV
 SYNCHRONIZATION_BARRIER

base-commit: 190dc27998d5b7b4c36e12bebe62f7176f4b4507
-- 
2.43.0

Reply via email to