On 2025-03-25 10:27, torikoshia wrote:
On 2025-03-22 20:23, Jelte Fennema-Nio wrote:
On Wed, 19 Mar 2025 at 14:15, torikoshia <torikos...@oss.nttdata.com>
wrote:
BTW based on your discussion, I thought this patch could not be
merged
anytime soon. Does that align with your understanding?
Yeah, that aligns with my understanding. I don't think it's realistic
to get this merged before the code freeze, but I think both of the
below issues could be resolved.
- With bgworker-based AIO, this patch could mislead users into
underestimating the actual storage I/O load, which is undesirable.
To resolve this, I think the patch would need to change to not report
anything if bgworker-based AIO is used.
Agreed.
I feel the new GUC io_method can be used to determine whether
bgworker-based AIO is being used.
I took this approach and when io_method=worker, no additional output is
shown in the attached patch.
--
Regards,
--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
From e80e53eb36f7909ca8638b26d3cd61a58a5bc889 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Fri, 11 Apr 2025 22:01:22 +0900
Subject: [PATCH v4] Add storage I/O tracking to 'BUFFERS' option
The 'BUFFERS' option currently indicates whether a block hit the shared
buffer, but does not distinguish between a cache hit in the OS cache or
a storage I/O operation.
While shared buffers and OS cache offer similar performance, storage
I/O is significantly slower in comparison in general. By measuring
the numbers of storage I/O read and write, we can better identify if
storage I/O is a bottleneck in performance.
Added tracking of storage I/O usage by calling getrusage(2) at both the
planning and execution phase start and end points.
A more granular approach as well as current BUFFERS option(tracking at
each plan node) was considered but found to be impractical due to the
high performance cost of frequent getrusage() calls.
This output is shown when io_method=worker, since asynchronous workers
handle I/O for multiple processes, and isolating the EXPLAIN target's
I/O is difficult.
TODO:
I believe this information is mainly useful when used in auto_explain.
I'll implement it later.
---
doc/src/sgml/ref/explain.sgml | 25 +-
src/backend/access/brin/brin.c | 8 +-
src/backend/access/gin/gininsert.c | 8 +-
src/backend/access/nbtree/nbtsort.c | 8 +-
src/backend/commands/explain.c | 125 +++-
src/backend/commands/prepare.c | 8 +
src/backend/commands/vacuumparallel.c | 8 +-
src/backend/executor/execParallel.c | 35 +-
src/backend/executor/instrument.c | 79 ++-
src/include/commands/explain.h | 1 +
src/include/executor/execParallel.h | 2 +
src/include/executor/instrument.h | 20 +-
src/include/port/win32/sys/resource.h | 2 +
src/port/win32getrusage.c | 4 +
src/test/regress/expected/explain_1.out | 849 ++++++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
16 files changed, 1145 insertions(+), 38 deletions(-)
create mode 100644 src/test/regress/expected/explain_1.out
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 9ed1061b7f..493afe6a34 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -201,10 +201,27 @@ ROLLBACK;
query processing.
The number of blocks shown for an
upper-level node includes those used by all its child nodes. In text
- format, only non-zero values are printed. Buffers information is
- included by default when <literal>ANALYZE</literal> is used but
- otherwise is not included by default, but can be enabled using this
- option.
+ format, only non-zero values are printed.
+ If possible, this option also displays the number of read and write
+ operations performed on storage during the planning and execution phases,
+ shown at the end of the plan. These values are obtained from the
+ <function>getrusage()</function> system call. Note that on platforms that
+ do not support <function>getrusage()</function>, such as Windows, no output
+ will be shown, even if reads or writes actually occur. Additionally, even
+ on platforms where <function>getrusage()</function> is supported, if the
+ kernel is built without the necessary options to track storage read and
+ write operations, no output will be shown. Also, When
+ <varname>io_method</varname> is set to <literal>worker</literal>, no output
+ will be shown, as I/O handled by asynchronous workers cannot be measured
+ accurately.
+ The timing and unit of measurement for read and write operations may vary
+ depending on the platform. For example, on Linux, a read is counted only
+ if this process caused data to be fetched from the storage layer, and a
+ write is counted at the page-dirtying time. On Linux, the unit of
+ measurement for read and write operations is 512 bytes.
+ Buffers information is included by default when <literal>ANALYZE</literal>
+ is used but otherwise is not included by default, but can be enabled using
+ this option.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 01e1db7f85..c6a8f74213 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2557,7 +2557,7 @@ _brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
* or we might get incomplete data.)
*/
for (i = 0; i < brinleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&brinleader->bufferusage[i], &brinleader->walusage[i]);
+ InstrAccumParallelQuery(&brinleader->bufferusage[i], NULL, &brinleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(brinleader->snapshot))
@@ -2919,7 +2919,7 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort, seg);
/* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
+ InstrStartParallelQuery(NULL);
/*
* Might as well use reliable figure when doling out maintenance_work_mem
@@ -2934,8 +2934,8 @@ _brin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
/* Report WAL/buffer usage during parallel execution */
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
- &walusage[ParallelWorkerNumber]);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+ &walusage[ParallelWorkerNumber], NULL);
index_close(indexRel, indexLockmode);
table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e25d817c19..5463d81c96 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -1084,7 +1084,7 @@ _gin_end_parallel(GinLeader *ginleader, GinBuildState *state)
* or we might get incomplete data.)
*/
for (i = 0; i < ginleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&ginleader->bufferusage[i], &ginleader->walusage[i]);
+ InstrAccumParallelQuery(&ginleader->bufferusage[i], NULL, &ginleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(ginleader->snapshot))
@@ -2135,7 +2135,7 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
tuplesort_attach_shared(sharedsort, seg);
/* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
+ InstrStartParallelQuery(NULL);
/*
* Might as well use reliable figure when doling out maintenance_work_mem
@@ -2150,8 +2150,8 @@ _gin_parallel_build_main(dsm_segment *seg, shm_toc *toc)
/* Report WAL/buffer usage during parallel execution */
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
- &walusage[ParallelWorkerNumber]);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+ &walusage[ParallelWorkerNumber], NULL);
index_close(indexRel, indexLockmode);
table_close(heapRel, heapLockmode);
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 3794cc924a..c66e742909 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1618,7 +1618,7 @@ _bt_end_parallel(BTLeader *btleader)
* or we might get incomplete data.)
*/
for (i = 0; i < btleader->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&btleader->bufferusage[i], &btleader->walusage[i]);
+ InstrAccumParallelQuery(&btleader->bufferusage[i], NULL, &btleader->walusage[i]);
/* Free last reference to MVCC snapshot, if one was used */
if (IsMVCCSnapshot(btleader->snapshot))
@@ -1826,7 +1826,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
}
/* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
+ InstrStartParallelQuery(NULL);
/* Perform sorting of spool, and possibly a spool2 */
sortmem = maintenance_work_mem / btshared->scantuplesortstates;
@@ -1836,8 +1836,8 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc)
/* Report WAL/buffer usage during parallel execution */
bufferusage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
- &walusage[ParallelWorkerNumber]);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], NULL,
+ &walusage[ParallelWorkerNumber], NULL);
#ifdef BTREE_BUILD_STATS
if (log_btree_build_stats)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 786ee865f1..2e391b347b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -32,6 +32,7 @@
#include "parser/analyze.h"
#include "parser/parsetree.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/aio_subsys.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
@@ -144,6 +145,8 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
static const char *explain_get_index_name(Oid indexId);
static bool peek_buffer_usage(ExplainState *es, const BufferUsage *usage);
static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static bool peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
+static void show_storageio_usage(ExplainState *es, const StorageIOUsage *usage);
static void show_wal_usage(ExplainState *es, const WalUsage *usage);
static void show_memory_counters(ExplainState *es,
const MemoryContextCounters *mem_counters);
@@ -325,6 +328,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
planduration;
BufferUsage bufusage_start,
bufusage;
+ StorageIOUsage storageio,
+ storageio_start;
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
@@ -346,7 +351,10 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
if (es->buffers)
+ {
bufusage_start = pgBufferUsage;
+ GetStorageIOUsage(&storageio_start);
+ }
INSTR_TIME_SET_CURRENT(planstart);
/* plan the query */
@@ -361,17 +369,21 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
MemoryContextMemConsumed(planner_ctx, &mem_counters);
}
- /* calc differences of buffer counters. */
+ /* calc differences of buffer and storage I/O counters. */
if (es->buffers)
{
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+
+ GetStorageIOUsage(&storageio);
+ StorageIOUsageDiff(&storageio, &storageio_start);
}
/* run it (if needed) and produce output */
ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
+ es->buffers ? &storageio : NULL,
es->memory ? &mem_counters : NULL);
}
@@ -497,7 +509,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
- const BufferUsage *bufusage,
+ const BufferUsage *bufusage, const StorageIOUsage *planstorageio,
const MemoryContextCounters *mem_counters)
{
DestReceiver *dest;
@@ -507,6 +519,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
int eflags;
int instrument_option = 0;
SerializeMetrics serializeMetrics = {0};
+ StorageIOUsage storageio_start;
Assert(plannedstmt->commandType != CMD_UTILITY);
@@ -516,7 +529,19 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
instrument_option |= INSTRUMENT_ROWS;
if (es->buffers)
+ {
+ GetStorageIOUsage(&storageio_start);
+
+ /*
+ * Initialize global variable counters for parallel query workers.
+ * Even if the query is cancelled on the way, the EXPLAIN execution
+ * always passes here, so it can be initialized here.
+ */
+ pgStorageIOUsageParallel.inblock = 0;
+ pgStorageIOUsageParallel.outblock = 0;
+
instrument_option |= INSTRUMENT_BUFFERS;
+ }
if (es->wal)
instrument_option |= INSTRUMENT_WAL;
@@ -609,8 +634,9 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
/* Create textual dump of plan tree */
ExplainPrintPlan(es, queryDesc);
- /* Show buffer and/or memory usage in planning */
- if (peek_buffer_usage(es, bufusage) || mem_counters)
+ /* Show buffer, storage I/O, and/or memory usage in planning */
+ if (peek_buffer_usage(es, bufusage) || peek_storageio_usage(es, planstorageio) ||
+ mem_counters)
{
ExplainOpenGroup("Planning", "Planning", true, es);
@@ -622,8 +648,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
}
if (bufusage)
+ {
show_buffer_usage(es, bufusage);
-
+ show_storageio_usage(es, planstorageio);
+ }
if (mem_counters)
show_memory_counters(es, mem_counters);
@@ -680,6 +708,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
totaltime += elapsed_time(&starttime);
+ /* Show storage I/O usage in execution */
+ if (es->buffers)
+ {
+ StorageIOUsage storageio;
+
+ GetStorageIOUsage(&storageio);
+ StorageIOUsageDiff(&storageio, &storageio_start);
+ StorageIOUsageAdd(&storageio, &pgStorageIOUsageParallel);
+
+ if (peek_storageio_usage(es, &storageio))
+ {
+ ExplainOpenGroup("Execution", "Execution", true, es);
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "Execution:\n");
+ es->indent++;
+ }
+ show_storageio_usage(es, &storageio);
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ es->indent--;
+
+ ExplainCloseGroup("Execution", "Execution", true, es);
+ }
+ }
+
/*
* We only report execution time if we actually ran the query (that is,
* the user specified ANALYZE), and if summary reporting is enabled (the
@@ -4260,6 +4316,65 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
}
}
+/*
+ * Return whether show_storageio_usage would have anything to print, if given
+ * the same 'usage' data. Note that when the format is anything other than
+ * text, we print even if the counters are all zeroes.
+ */
+static bool
+peek_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+ if (usage == NULL)
+ return false;
+
+ /*
+ * Since showing only the I/O excluding AIO workers underestimates the
+ * total I/O, treat this case as having nothing to print.
+ */
+ if (pgaio_workers_enabled())
+ return false;
+
+ if (es->format != EXPLAIN_FORMAT_TEXT)
+ return true;
+
+ return usage->inblock > 0 || usage->outblock > 0;
+}
+
+/*
+ * Show storage I/O usage.
+ */
+static void
+show_storageio_usage(ExplainState *es, const StorageIOUsage *usage)
+{
+ /*
+ * Since showing only the I/O excluding AIO workers underestimates the
+ * total I/O, do not show anything.
+ */
+ if (pgaio_workers_enabled())
+ return;
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ /* Show only positive counter values. */
+ if (usage->inblock <= 0 && usage->outblock <= 0)
+ return;
+
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "Storage I/O:");
+ appendStringInfo(es->str, " read=%ld times", (long) usage->inblock);
+ appendStringInfo(es->str, " write=%ld times", (long) usage->outblock);
+
+ appendStringInfoChar(es->str, '\n');
+ }
+ else
+ {
+ ExplainPropertyInteger("Storage I/O Read", NULL,
+ usage->inblock, es);
+ ExplainPropertyInteger("Storage I/O Read", NULL,
+ usage->outblock, es);
+ }
+}
+
/*
* Show WAL usage details.
*/
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index bf7d2b2309..68b87aab0c 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -583,6 +583,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
instr_time planduration;
BufferUsage bufusage_start,
bufusage;
+ StorageIOUsage storageio,
+ storageio_start;
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
@@ -599,7 +601,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
if (es->buffers)
+ {
bufusage_start = pgBufferUsage;
+ GetStorageIOUsage(&storageio_start);
+ }
+
INSTR_TIME_SET_CURRENT(planstart);
/* Look it up in the hash table */
@@ -649,6 +655,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
{
memset(&bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start);
+ GetStorageIOUsage(&storageio);
}
plan_list = cplan->stmt_list;
@@ -662,6 +669,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
+ es->buffers ? &storageio : NULL,
es->memory ? &mem_counters : NULL);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 2b9d548cde..f77124f8c5 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -737,7 +737,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
WaitForParallelWorkersToFinish(pvs->pcxt);
for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], NULL, &pvs->wal_usage[i]);
}
/*
@@ -1083,7 +1083,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
error_context_stack = &errcallback;
/* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
+ InstrStartParallelQuery(NULL);
/* Process indexes to perform vacuum/cleanup */
parallel_vacuum_process_safe_indexes(&pvs);
@@ -1091,8 +1091,8 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], NULL,
+ &wal_usage[ParallelWorkerNumber], NULL);
/* Report any remaining cost-based vacuum delay time */
if (track_cost_delay_timing)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 39c990ae63..cf46633100 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -65,6 +65,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_STORAGEIO_USAGE UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -609,6 +610,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
char *pstmt_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
+ StorageIOUsage *storageiousage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
@@ -690,6 +692,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Same thing for StorageIOUsage.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,
mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers));
@@ -785,6 +794,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space);
pei->wal_usage = walusage_space;
+ /* Same for StorageIOUsage. */
+ storageiousage_space = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(StorageIOUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_STORAGEIO_USAGE, storageiousage_space);
+ pei->storageio_usage = storageiousage_space;
+
/* Set up the tuple queues that the workers will write into. */
pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false);
@@ -1190,11 +1205,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
WaitForParallelWorkersToFinish(pei->pcxt);
/*
- * Next, accumulate buffer/WAL usage. (This must wait for the workers to
- * finish, or we might get incomplete data.)
+ * Next, accumulate buffer, WAL, and Storage I/O usage. (This must wait
+ * for the workers to finish, or we might get incomplete data.)
*/
for (i = 0; i < nworkers; i++)
- InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]);
+ InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->storageio_usage[i], &pei->wal_usage[i]);
pei->finished = true;
}
@@ -1437,6 +1452,8 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
{
FixedParallelExecutorState *fpes;
BufferUsage *buffer_usage;
+ StorageIOUsage *storageio_usage;
+ StorageIOUsage storageio_usage_start;
WalUsage *wal_usage;
DestReceiver *receiver;
QueryDesc *queryDesc;
@@ -1491,13 +1508,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate);
/*
- * Prepare to track buffer/WAL usage during query execution.
+ * Prepare to track buffer, WAL, and StorageI/O usage during query
+ * execution.
*
* We do this after starting up the executor to match what happens in the
* leader, which also doesn't count buffer accesses and WAL activity that
* occur during executor startup.
*/
- InstrStartParallelQuery();
+ InstrStartParallelQuery(&storageio_usage_start);
/*
* Run the plan. If we specified a tuple bound, be careful not to demand
@@ -1510,11 +1528,14 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Shut down the executor */
ExecutorFinish(queryDesc);
- /* Report buffer/WAL usage during parallel execution. */
+ /* Report buffer, WAL, and storage I/O usage during parallel execution. */
buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false);
+ storageio_usage = shm_toc_lookup(toc, PARALLEL_KEY_STORAGEIO_USAGE, false);
wal_usage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false);
InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
+ &storageio_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber],
+ &storageio_usage_start);
/* Report instrumentation data if any instrumentation options are set. */
if (instrumentation != NULL)
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 56e635f470..9cb0e9300b 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -13,16 +13,22 @@
*/
#include "postgres.h"
+#include <sys/resource.h>
#include <unistd.h>
#include "executor/instrument.h"
+#include "storage/aio_subsys.h"
BufferUsage pgBufferUsage;
static BufferUsage save_pgBufferUsage;
+
+StorageIOUsage pgStorageIOUsageParallel; /* only count parallel workers'
+ * usage */
WalUsage pgWalUsage;
static WalUsage save_pgWalUsage;
static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
+void StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
static void WalUsageAdd(WalUsage *dst, WalUsage *add);
@@ -197,27 +203,47 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add)
/* note current values during parallel executor startup */
void
-InstrStartParallelQuery(void)
+InstrStartParallelQuery(StorageIOUsage *storageiousage)
{
save_pgBufferUsage = pgBufferUsage;
save_pgWalUsage = pgWalUsage;
+
+ if (storageiousage != NULL)
+ GetStorageIOUsage(storageiousage);
}
/* report usage after parallel executor shutdown */
void
-InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start)
{
memset(bufusage, 0, sizeof(BufferUsage));
BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage);
+
+ if (storageiousage != NULL && storageiousage_start != NULL)
+ {
+ struct StorageIOUsage storageiousage_end;
+
+ GetStorageIOUsage(&storageiousage_end);
+
+ memset(storageiousage, 0, sizeof(StorageIOUsage));
+ StorageIOUsageAccumDiff(storageiousage, &storageiousage_end, storageiousage_start);
+
+ ereport(DEBUG1,
+ (errmsg("Parallel worker's storage I/O times: inblock:%ld outblock:%ld",
+ storageiousage->inblock, storageiousage->outblock)));
+ }
memset(walusage, 0, sizeof(WalUsage));
WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage);
}
/* accumulate work done by workers in leader's stats */
void
-InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage)
+InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage)
{
BufferUsageAdd(&pgBufferUsage, bufusage);
+
+ if (storageiousage != NULL)
+ StorageIOUsageAdd(&pgStorageIOUsageParallel, storageiousage);
WalUsageAdd(&pgWalUsage, walusage);
}
@@ -273,6 +299,53 @@ BufferUsageAccumDiff(BufferUsage *dst,
add->temp_blk_write_time, sub->temp_blk_write_time);
}
+/* helper functions for StorageIOUsage usage accumulation */
+void
+StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add)
+{
+ dst->inblock += add->inblock;
+ dst->outblock += add->outblock;
+}
+
+/* dst += add - sub */
+void
+StorageIOUsageAccumDiff(StorageIOUsage *dst, const StorageIOUsage *add, const StorageIOUsage *sub)
+{
+ dst->inblock += add->inblock - sub->inblock;
+ dst->outblock += add->outblock - sub->outblock;
+}
+
+/* dst -= sub */
+void
+StorageIOUsageDiff(StorageIOUsage *dst, const StorageIOUsage *sub)
+{
+ dst->inblock -= sub->inblock;
+ dst->outblock -= sub->outblock;
+}
+
+/* Captures the current storage I/O usage statistics */
+void
+GetStorageIOUsage(StorageIOUsage *usage)
+{
+ struct rusage rusage;
+
+ /*
+ * Since getting the I/O excluding AIO workers underestimates the total
+ * I/O, don't get the I/O usage statistics when AIO worker is enabled.
+ */
+ if (pgaio_workers_enabled())
+ return;
+
+ if (getrusage(RUSAGE_SELF, &rusage))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYSTEM_ERROR),
+ errmsg("getrusage() failed: %m")));
+ }
+ usage->inblock = rusage.ru_inblock;
+ usage->outblock = rusage.ru_oublock;
+}
+
/* helper functions for WAL usage accumulation */
static void
WalUsageAdd(WalUsage *dst, WalUsage *add)
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 387839eb5d..0becde3319 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -70,6 +70,7 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
+ const StorageIOUsage *planstorageio,
const MemoryContextCounters *mem_counters);
extern void ExplainPrintPlan(struct ExplainState *es, QueryDesc *queryDesc);
diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h
index 5e7106c397..5c8bc76c53 100644
--- a/src/include/executor/execParallel.h
+++ b/src/include/executor/execParallel.h
@@ -26,6 +26,8 @@ typedef struct ParallelExecutorInfo
PlanState *planstate; /* plan subtree we're running in parallel */
ParallelContext *pcxt; /* parallel context we're using */
BufferUsage *buffer_usage; /* points to bufusage area in DSM */
+ StorageIOUsage *storageio_usage; /* points to storageio usage area in
+ * DSM */
WalUsage *wal_usage; /* walusage area in DSM */
SharedExecutorInstrumentation *instrumentation; /* optional */
struct SharedJitInstrumentation *jit_instrumentation; /* optional */
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 03653ab6c6..5392f05022 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -41,6 +41,14 @@ typedef struct BufferUsage
instr_time temp_blk_write_time; /* time spent writing temp blocks */
} BufferUsage;
+typedef struct StorageIOUsage
+{
+ long inblock; /* # of times the file system had to perform
+ * input */
+ long outblock; /* # of times the file system had to perform
+ * output */
+} StorageIOUsage;
+
/*
* WalUsage tracks only WAL activity like WAL records generation that
* can be measured per query and is displayed by EXPLAIN command,
@@ -100,6 +108,7 @@ typedef struct WorkerInstrumentation
} WorkerInstrumentation;
extern PGDLLIMPORT BufferUsage pgBufferUsage;
+extern PGDLLIMPORT StorageIOUsage pgStorageIOUsageParallel;
extern PGDLLIMPORT WalUsage pgWalUsage;
extern Instrumentation *InstrAlloc(int n, int instrument_options,
@@ -110,11 +119,16 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples);
extern void InstrUpdateTupleCount(Instrumentation *instr, double nTuples);
extern void InstrEndLoop(Instrumentation *instr);
extern void InstrAggNode(Instrumentation *dst, Instrumentation *add);
-extern void InstrStartParallelQuery(void);
-extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
-extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage);
+extern void InstrStartParallelQuery(StorageIOUsage *storageiousage);
+extern void InstrEndParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage, StorageIOUsage *storageiousage_start);
+extern void InstrAccumParallelQuery(BufferUsage *bufusage, StorageIOUsage *storageiousage, WalUsage *walusage);
extern void BufferUsageAccumDiff(BufferUsage *dst,
const BufferUsage *add, const BufferUsage *sub);
+extern void StorageIOUsageAccumDiff(StorageIOUsage *dst,
+ const StorageIOUsage *add, const StorageIOUsage *sub);
+extern void StorageIOUsageDiff(StorageIOUsage *dst, const StorageIOUsage *sub);
+extern void StorageIOUsageAdd(StorageIOUsage *dst, const StorageIOUsage *add);
+extern void GetStorageIOUsage(StorageIOUsage *usage);
extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add,
const WalUsage *sub);
diff --git a/src/include/port/win32/sys/resource.h b/src/include/port/win32/sys/resource.h
index a14feeb584..270dc37c84 100644
--- a/src/include/port/win32/sys/resource.h
+++ b/src/include/port/win32/sys/resource.h
@@ -13,6 +13,8 @@ struct rusage
{
struct timeval ru_utime; /* user time used */
struct timeval ru_stime; /* system time used */
+ long ru_inblock; /* Currently always 0 for Windows */
+ long ru_oublock; /* Currently always 0 for Windows */
};
extern int getrusage(int who, struct rusage *rusage);
diff --git a/src/port/win32getrusage.c b/src/port/win32getrusage.c
index 6a197c9437..27f0ea052a 100644
--- a/src/port/win32getrusage.c
+++ b/src/port/win32getrusage.c
@@ -57,5 +57,9 @@ getrusage(int who, struct rusage *rusage)
rusage->ru_utime.tv_sec = li.QuadPart / 1000000L;
rusage->ru_utime.tv_usec = li.QuadPart % 1000000L;
+ /* Currently always 0 for Windows */
+ rusage->ru_inblock = 0;
+ rusage->ru_oublock = 0;
+
return 0;
}
diff --git a/src/test/regress/expected/explain_1.out b/src/test/regress/expected/explain_1.out
new file mode 100644
index 0000000000..215ce1818b
--- /dev/null
+++ b/src/test/regress/expected/explain_1.out
@@ -0,0 +1,849 @@
+--
+-- EXPLAIN
+--
+-- There are many test cases elsewhere that use EXPLAIN as a vehicle for
+-- checking something else (usually planner behavior). This file is
+-- concerned with testing EXPLAIN in its own right.
+--
+-- To produce stable regression test output, it's usually necessary to
+-- ignore details such as exact costs or row counts. These filter
+-- functions replace changeable output details with fixed strings.
+create function explain_filter(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in execute $1
+ loop
+ -- Replace any numeric word with just 'N'
+ ln := regexp_replace(ln, '-?\m\d+\M', 'N', 'g');
+ -- In sort output, the above won't match units-suffixed numbers
+ ln := regexp_replace(ln, '\m\d+kB', 'NkB', 'g');
+ -- Ignore text-mode buffers output because it varies depending
+ -- on the system state
+ CONTINUE WHEN (ln ~ ' +Buffers: .*');
+ -- Ignore text-mode "Planning:" line because whether it's output
+ -- varies depending on the system state
+ CONTINUE WHEN (ln = 'Planning:');
+ return next ln;
+ end loop;
+end;
+$$;
+-- To produce valid JSON output, replace numbers with "0" or "0.0" not "N"
+create function explain_filter_to_json(text) returns jsonb
+language plpgsql as
+$$
+declare
+ data text := '';
+ ln text;
+begin
+ for ln in execute $1
+ loop
+ -- Replace any numeric word with just '0'
+ ln := regexp_replace(ln, '\m\d+\M', '0', 'g');
+ data := data || ln;
+ end loop;
+ return data::jsonb;
+end;
+$$;
+-- Disable JIT, or we'll get different output on machines where that's been
+-- forced on
+set jit = off;
+-- Similarly, disable track_io_timing, to avoid output differences when
+-- enabled.
+set track_io_timing = off;
+-- Simple cases
+select explain_filter('explain select * from int8_tbl i8');
+ explain_filter
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N)
+(1 row)
+
+select explain_filter('explain (analyze, buffers off) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(3 rows)
+
+select explain_filter('explain (analyze, buffers off, verbose) select * from int8_tbl i8');
+ explain_filter
+--------------------------------------------------------------------------------------------------------
+ Seq Scan on public.int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Output: q1, q2
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze, buffers, format text) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(3 rows)
+
+select explain_filter('explain (analyze, buffers, format xml) select * from int8_tbl i8');
+ explain_filter
+--------------------------------------------------------
+ <explain xmlns="http://www.postgresql.org/N/explain"> +
+ <Query> +
+ <Plan> +
+ <Node-Type>Seq Scan</Node-Type> +
+ <Parallel-Aware>false</Parallel-Aware> +
+ <Async-Capable>false</Async-Capable> +
+ <Relation-Name>int8_tbl</Relation-Name> +
+ <Alias>i8</Alias> +
+ <Startup-Cost>N.N</Startup-Cost> +
+ <Total-Cost>N.N</Total-Cost> +
+ <Plan-Rows>N</Plan-Rows> +
+ <Plan-Width>N</Plan-Width> +
+ <Actual-Startup-Time>N.N</Actual-Startup-Time> +
+ <Actual-Total-Time>N.N</Actual-Total-Time> +
+ <Actual-Rows>N.N</Actual-Rows> +
+ <Actual-Loops>N</Actual-Loops> +
+ <Disabled>false</Disabled> +
+ <Shared-Hit-Blocks>N</Shared-Hit-Blocks> +
+ <Shared-Read-Blocks>N</Shared-Read-Blocks> +
+ <Shared-Dirtied-Blocks>N</Shared-Dirtied-Blocks>+
+ <Shared-Written-Blocks>N</Shared-Written-Blocks>+
+ <Local-Hit-Blocks>N</Local-Hit-Blocks> +
+ <Local-Read-Blocks>N</Local-Read-Blocks> +
+ <Local-Dirtied-Blocks>N</Local-Dirtied-Blocks> +
+ <Local-Written-Blocks>N</Local-Written-Blocks> +
+ <Temp-Read-Blocks>N</Temp-Read-Blocks> +
+ <Temp-Written-Blocks>N</Temp-Written-Blocks> +
+ </Plan> +
+ <Planning> +
+ <Shared-Hit-Blocks>N</Shared-Hit-Blocks> +
+ <Shared-Read-Blocks>N</Shared-Read-Blocks> +
+ <Shared-Dirtied-Blocks>N</Shared-Dirtied-Blocks>+
+ <Shared-Written-Blocks>N</Shared-Written-Blocks>+
+ <Local-Hit-Blocks>N</Local-Hit-Blocks> +
+ <Local-Read-Blocks>N</Local-Read-Blocks> +
+ <Local-Dirtied-Blocks>N</Local-Dirtied-Blocks> +
+ <Local-Written-Blocks>N</Local-Written-Blocks> +
+ <Temp-Read-Blocks>N</Temp-Read-Blocks> +
+ <Temp-Written-Blocks>N</Temp-Written-Blocks> +
+ <Storage-I-O-Read>N</Storage-I-O-Read> +
+ <Storage-I-O-Read>N</Storage-I-O-Read> +
+ </Planning> +
+ <Planning-Time>N.N</Planning-Time> +
+ <Triggers> +
+ </Triggers> +
+ <Execution> +
+ <Storage-I-O-Read>N</Storage-I-O-Read> +
+ <Storage-I-O-Read>N</Storage-I-O-Read> +
+ </Execution> +
+ <Execution-Time>N.N</Execution-Time> +
+ </Query> +
+ </explain>
+(1 row)
+
+select explain_filter('explain (analyze, serialize, buffers, format yaml) select * from int8_tbl i8');
+ explain_filter
+-------------------------------
+ - Plan: +
+ Node Type: "Seq Scan" +
+ Parallel Aware: false +
+ Async Capable: false +
+ Relation Name: "int8_tbl"+
+ Alias: "i8" +
+ Startup Cost: N.N +
+ Total Cost: N.N +
+ Plan Rows: N +
+ Plan Width: N +
+ Actual Startup Time: N.N +
+ Actual Total Time: N.N +
+ Actual Rows: N.N +
+ Actual Loops: N +
+ Disabled: false +
+ Shared Hit Blocks: N +
+ Shared Read Blocks: N +
+ Shared Dirtied Blocks: N +
+ Shared Written Blocks: N +
+ Local Hit Blocks: N +
+ Local Read Blocks: N +
+ Local Dirtied Blocks: N +
+ Local Written Blocks: N +
+ Temp Read Blocks: N +
+ Temp Written Blocks: N +
+ Planning: +
+ Shared Hit Blocks: N +
+ Shared Read Blocks: N +
+ Shared Dirtied Blocks: N +
+ Shared Written Blocks: N +
+ Local Hit Blocks: N +
+ Local Read Blocks: N +
+ Local Dirtied Blocks: N +
+ Local Written Blocks: N +
+ Temp Read Blocks: N +
+ Temp Written Blocks: N +
+ Storage I/O Read: N +
+ Storage I/O Read: N +
+ Planning Time: N.N +
+ Triggers: +
+ Serialization: +
+ Time: N.N +
+ Output Volume: N +
+ Format: "text" +
+ Shared Hit Blocks: N +
+ Shared Read Blocks: N +
+ Shared Dirtied Blocks: N +
+ Shared Written Blocks: N +
+ Local Hit Blocks: N +
+ Local Read Blocks: N +
+ Local Dirtied Blocks: N +
+ Local Written Blocks: N +
+ Temp Read Blocks: N +
+ Temp Written Blocks: N +
+ Execution: +
+ Storage I/O Read: N +
+ Storage I/O Read: N +
+ Execution Time: N.N
+(1 row)
+
+select explain_filter('explain (buffers, format text) select * from int8_tbl i8');
+ explain_filter
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N)
+(1 row)
+
+select explain_filter('explain (buffers, format json) select * from int8_tbl i8');
+ explain_filter
+------------------------------------
+ [ +
+ { +
+ "Plan": { +
+ "Node Type": "Seq Scan", +
+ "Parallel Aware": false, +
+ "Async Capable": false, +
+ "Relation Name": "int8_tbl",+
+ "Alias": "i8", +
+ "Startup Cost": N.N, +
+ "Total Cost": N.N, +
+ "Plan Rows": N, +
+ "Plan Width": N, +
+ "Disabled": false, +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N +
+ }, +
+ "Planning": { +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N, +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N +
+ }, +
+ "Execution": { +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N +
+ } +
+ } +
+ ]
+(1 row)
+
+-- Check expansion of window definitions
+select explain_filter('explain verbose select sum(unique1) over w, sum(unique2) over (w order by hundred), sum(tenthous) over (w order by hundred) from tenk1 window w as (partition by ten)');
+ explain_filter
+-------------------------------------------------------------------------------------------------------
+ WindowAgg (cost=N.N..N.N rows=N width=N)
+ Output: sum(unique1) OVER w, (sum(unique2) OVER w1), (sum(tenthous) OVER w1), ten, hundred
+ Window: w AS (PARTITION BY tenk1.ten)
+ -> WindowAgg (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous, sum(unique2) OVER w1, sum(tenthous) OVER w1
+ Window: w1 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred)
+ -> Sort (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous
+ Sort Key: tenk1.ten, tenk1.hundred
+ -> Seq Scan on public.tenk1 (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous
+(11 rows)
+
+select explain_filter('explain verbose select sum(unique1) over w1, sum(unique2) over (w1 order by hundred), sum(tenthous) over (w1 order by hundred rows 10 preceding) from tenk1 window w1 as (partition by ten)');
+ explain_filter
+---------------------------------------------------------------------------------------------------------
+ WindowAgg (cost=N.N..N.N rows=N width=N)
+ Output: sum(unique1) OVER w1, (sum(unique2) OVER w2), (sum(tenthous) OVER w3), ten, hundred
+ Window: w1 AS (PARTITION BY tenk1.ten)
+ -> WindowAgg (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous, (sum(unique2) OVER w2), sum(tenthous) OVER w3
+ Window: w3 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred ROWS 'N'::bigint PRECEDING)
+ -> WindowAgg (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous, sum(unique2) OVER w2
+ Window: w2 AS (PARTITION BY tenk1.ten ORDER BY tenk1.hundred)
+ -> Sort (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous
+ Sort Key: tenk1.ten, tenk1.hundred
+ -> Seq Scan on public.tenk1 (cost=N.N..N.N rows=N width=N)
+ Output: ten, hundred, unique1, unique2, tenthous
+(14 rows)
+
+-- Check output including I/O timings. These fields are conditional
+-- but always set in JSON format, so check them only in this case.
+set track_io_timing = on;
+select explain_filter('explain (analyze, buffers, format json) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------
+ [ +
+ { +
+ "Plan": { +
+ "Node Type": "Seq Scan", +
+ "Parallel Aware": false, +
+ "Async Capable": false, +
+ "Relation Name": "int8_tbl", +
+ "Alias": "i8", +
+ "Startup Cost": N.N, +
+ "Total Cost": N.N, +
+ "Plan Rows": N, +
+ "Plan Width": N, +
+ "Actual Startup Time": N.N, +
+ "Actual Total Time": N.N, +
+ "Actual Rows": N.N, +
+ "Actual Loops": N, +
+ "Disabled": false, +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N, +
+ "Shared I/O Read Time": N.N, +
+ "Shared I/O Write Time": N.N,+
+ "Local I/O Read Time": N.N, +
+ "Local I/O Write Time": N.N, +
+ "Temp I/O Read Time": N.N, +
+ "Temp I/O Write Time": N.N +
+ }, +
+ "Planning": { +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N, +
+ "Shared I/O Read Time": N.N, +
+ "Shared I/O Write Time": N.N,+
+ "Local I/O Read Time": N.N, +
+ "Local I/O Write Time": N.N, +
+ "Temp I/O Read Time": N.N, +
+ "Temp I/O Write Time": N.N, +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N +
+ }, +
+ "Planning Time": N.N, +
+ "Triggers": [ +
+ ], +
+ "Execution": { +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N +
+ }, +
+ "Execution Time": N.N +
+ } +
+ ]
+(1 row)
+
+set track_io_timing = off;
+-- SETTINGS option
+-- We have to ignore other settings that might be imposed by the environment,
+-- so printing the whole Settings field unfortunately won't do.
+begin;
+set local plan_cache_mode = force_generic_plan;
+select true as "OK"
+ from explain_filter('explain (settings) select * from int8_tbl i8') ln
+ where ln ~ '^ *Settings: .*plan_cache_mode = ''force_generic_plan''';
+ OK
+----
+ t
+(1 row)
+
+select explain_filter_to_json('explain (settings, format json) select * from int8_tbl i8') #> '{0,Settings,plan_cache_mode}';
+ ?column?
+----------------------
+ "force_generic_plan"
+(1 row)
+
+rollback;
+-- GENERIC_PLAN option
+select explain_filter('explain (generic_plan) select unique1 from tenk1 where thousand = $1');
+ explain_filter
+---------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 (cost=N.N..N.N rows=N width=N)
+ Recheck Cond: (thousand = $N)
+ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=N.N..N.N rows=N width=N)
+ Index Cond: (thousand = $N)
+(4 rows)
+
+-- should fail
+select explain_filter('explain (analyze, generic_plan) select unique1 from tenk1 where thousand = $1');
+ERROR: EXPLAIN options ANALYZE and GENERIC_PLAN cannot be used together
+CONTEXT: PL/pgSQL function explain_filter(text) line 5 at FOR over EXECUTE statement
+-- MEMORY option
+select explain_filter('explain (memory) select * from int8_tbl i8');
+ explain_filter
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N)
+ Memory: used=NkB allocated=NkB
+(2 rows)
+
+select explain_filter('explain (memory, analyze, buffers off) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Memory: used=NkB allocated=NkB
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (memory, summary, format yaml) select * from int8_tbl i8');
+ explain_filter
+-------------------------------
+ - Plan: +
+ Node Type: "Seq Scan" +
+ Parallel Aware: false +
+ Async Capable: false +
+ Relation Name: "int8_tbl"+
+ Alias: "i8" +
+ Startup Cost: N.N +
+ Total Cost: N.N +
+ Plan Rows: N +
+ Plan Width: N +
+ Disabled: false +
+ Planning: +
+ Memory Used: N +
+ Memory Allocated: N +
+ Planning Time: N.N
+(1 row)
+
+select explain_filter('explain (memory, analyze, format json) select * from int8_tbl i8');
+ explain_filter
+------------------------------------
+ [ +
+ { +
+ "Plan": { +
+ "Node Type": "Seq Scan", +
+ "Parallel Aware": false, +
+ "Async Capable": false, +
+ "Relation Name": "int8_tbl",+
+ "Alias": "i8", +
+ "Startup Cost": N.N, +
+ "Total Cost": N.N, +
+ "Plan Rows": N, +
+ "Plan Width": N, +
+ "Actual Startup Time": N.N, +
+ "Actual Total Time": N.N, +
+ "Actual Rows": N.N, +
+ "Actual Loops": N, +
+ "Disabled": false, +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N +
+ }, +
+ "Planning": { +
+ "Shared Hit Blocks": N, +
+ "Shared Read Blocks": N, +
+ "Shared Dirtied Blocks": N, +
+ "Shared Written Blocks": N, +
+ "Local Hit Blocks": N, +
+ "Local Read Blocks": N, +
+ "Local Dirtied Blocks": N, +
+ "Local Written Blocks": N, +
+ "Temp Read Blocks": N, +
+ "Temp Written Blocks": N, +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N, +
+ "Memory Used": N, +
+ "Memory Allocated": N +
+ }, +
+ "Planning Time": N.N, +
+ "Triggers": [ +
+ ], +
+ "Execution": { +
+ "Storage I/O Read": N, +
+ "Storage I/O Read": N +
+ }, +
+ "Execution Time": N.N +
+ } +
+ ]
+(1 row)
+
+prepare int8_query as select * from int8_tbl i8;
+select explain_filter('explain (memory) execute int8_query');
+ explain_filter
+---------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N)
+ Memory: used=NkB allocated=NkB
+(2 rows)
+
+-- Test EXPLAIN (GENERIC_PLAN) with partition pruning
+-- partitions should be pruned at plan time, based on constants,
+-- but there should be no pruning based on parameter placeholders
+create table gen_part (
+ key1 integer not null,
+ key2 integer not null
+) partition by list (key1);
+create table gen_part_1
+ partition of gen_part for values in (1)
+ partition by range (key2);
+create table gen_part_1_1
+ partition of gen_part_1 for values from (1) to (2);
+create table gen_part_1_2
+ partition of gen_part_1 for values from (2) to (3);
+create table gen_part_2
+ partition of gen_part for values in (2);
+-- should scan gen_part_1_1 and gen_part_1_2, but not gen_part_2
+select explain_filter('explain (generic_plan) select key1, key2 from gen_part where key1 = 1 and key2 = $1');
+ explain_filter
+---------------------------------------------------------------------------
+ Append (cost=N.N..N.N rows=N width=N)
+ -> Seq Scan on gen_part_1_1 gen_part_1 (cost=N.N..N.N rows=N width=N)
+ Filter: ((key1 = N) AND (key2 = $N))
+ -> Seq Scan on gen_part_1_2 gen_part_2 (cost=N.N..N.N rows=N width=N)
+ Filter: ((key1 = N) AND (key2 = $N))
+(5 rows)
+
+drop table gen_part;
+--
+-- Test production of per-worker data
+--
+-- Unfortunately, because we don't know how many worker processes we'll
+-- actually get (maybe none at all), we can't examine the "Workers" output
+-- in any detail. We can check that it parses correctly as JSON, and then
+-- remove it from the displayed results.
+begin;
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+select jsonb_pretty(
+ explain_filter_to_json('explain (analyze, verbose, buffers, format json)
+ select * from tenk1 order by tenthous')
+ -- remove "Workers" node of the Seq Scan plan node
+ #- '{0,Plan,Plans,0,Plans,0,Workers}'
+ -- remove "Workers" node of the Sort plan node
+ #- '{0,Plan,Plans,0,Workers}'
+ -- Also remove its sort-type fields, as those aren't 100% stable
+ #- '{0,Plan,Plans,0,Sort Method}'
+ #- '{0,Plan,Plans,0,Sort Space Type}'
+);
+ jsonb_pretty
+-------------------------------------------------------------
+ [ +
+ { +
+ "Plan": { +
+ "Plans": [ +
+ { +
+ "Plans": [ +
+ { +
+ "Alias": "tenk1", +
+ "Output": [ +
+ "unique1", +
+ "unique2", +
+ "two", +
+ "four", +
+ "ten", +
+ "twenty", +
+ "hundred", +
+ "thousand", +
+ "twothousand", +
+ "fivethous", +
+ "tenthous", +
+ "odd", +
+ "even", +
+ "stringu1", +
+ "stringu2", +
+ "string4" +
+ ], +
+ "Schema": "public", +
+ "Disabled": false, +
+ "Node Type": "Seq Scan", +
+ "Plan Rows": 0, +
+ "Plan Width": 0, +
+ "Total Cost": 0.0, +
+ "Actual Rows": 0.0, +
+ "Actual Loops": 0, +
+ "Startup Cost": 0.0, +
+ "Async Capable": false, +
+ "Relation Name": "tenk1", +
+ "Parallel Aware": true, +
+ "Local Hit Blocks": 0, +
+ "Temp Read Blocks": 0, +
+ "Actual Total Time": 0.0, +
+ "Local Read Blocks": 0, +
+ "Shared Hit Blocks": 0, +
+ "Shared Read Blocks": 0, +
+ "Actual Startup Time": 0.0, +
+ "Parent Relationship": "Outer",+
+ "Temp Written Blocks": 0, +
+ "Local Dirtied Blocks": 0, +
+ "Local Written Blocks": 0, +
+ "Shared Dirtied Blocks": 0, +
+ "Shared Written Blocks": 0 +
+ } +
+ ], +
+ "Output": [ +
+ "unique1", +
+ "unique2", +
+ "two", +
+ "four", +
+ "ten", +
+ "twenty", +
+ "hundred", +
+ "thousand", +
+ "twothousand", +
+ "fivethous", +
+ "tenthous", +
+ "odd", +
+ "even", +
+ "stringu1", +
+ "stringu2", +
+ "string4" +
+ ], +
+ "Disabled": false, +
+ "Sort Key": [ +
+ "tenk1.tenthous" +
+ ], +
+ "Node Type": "Sort", +
+ "Plan Rows": 0, +
+ "Plan Width": 0, +
+ "Total Cost": 0.0, +
+ "Actual Rows": 0.0, +
+ "Actual Loops": 0, +
+ "Startup Cost": 0.0, +
+ "Async Capable": false, +
+ "Parallel Aware": false, +
+ "Sort Space Used": 0, +
+ "Local Hit Blocks": 0, +
+ "Temp Read Blocks": 0, +
+ "Actual Total Time": 0.0, +
+ "Local Read Blocks": 0, +
+ "Shared Hit Blocks": 0, +
+ "Shared Read Blocks": 0, +
+ "Actual Startup Time": 0.0, +
+ "Parent Relationship": "Outer", +
+ "Temp Written Blocks": 0, +
+ "Local Dirtied Blocks": 0, +
+ "Local Written Blocks": 0, +
+ "Shared Dirtied Blocks": 0, +
+ "Shared Written Blocks": 0 +
+ } +
+ ], +
+ "Output": [ +
+ "unique1", +
+ "unique2", +
+ "two", +
+ "four", +
+ "ten", +
+ "twenty", +
+ "hundred", +
+ "thousand", +
+ "twothousand", +
+ "fivethous", +
+ "tenthous", +
+ "odd", +
+ "even", +
+ "stringu1", +
+ "stringu2", +
+ "string4" +
+ ], +
+ "Disabled": false, +
+ "Node Type": "Gather Merge", +
+ "Plan Rows": 0, +
+ "Plan Width": 0, +
+ "Total Cost": 0.0, +
+ "Actual Rows": 0.0, +
+ "Actual Loops": 0, +
+ "Startup Cost": 0.0, +
+ "Async Capable": false, +
+ "Parallel Aware": false, +
+ "Workers Planned": 0, +
+ "Local Hit Blocks": 0, +
+ "Temp Read Blocks": 0, +
+ "Workers Launched": 0, +
+ "Actual Total Time": 0.0, +
+ "Local Read Blocks": 0, +
+ "Shared Hit Blocks": 0, +
+ "Shared Read Blocks": 0, +
+ "Actual Startup Time": 0.0, +
+ "Temp Written Blocks": 0, +
+ "Local Dirtied Blocks": 0, +
+ "Local Written Blocks": 0, +
+ "Shared Dirtied Blocks": 0, +
+ "Shared Written Blocks": 0 +
+ }, +
+ "Planning": { +
+ "Local Hit Blocks": 0, +
+ "Storage I/O Read": 0, +
+ "Temp Read Blocks": 0, +
+ "Local Read Blocks": 0, +
+ "Shared Hit Blocks": 0, +
+ "Shared Read Blocks": 0, +
+ "Temp Written Blocks": 0, +
+ "Local Dirtied Blocks": 0, +
+ "Local Written Blocks": 0, +
+ "Shared Dirtied Blocks": 0, +
+ "Shared Written Blocks": 0 +
+ }, +
+ "Triggers": [ +
+ ], +
+ "Execution": { +
+ "Storage I/O Read": 0 +
+ }, +
+ "Planning Time": 0.0, +
+ "Execution Time": 0.0 +
+ } +
+ ]
+(1 row)
+
+rollback;
+-- Test display of temporary objects
+create temp table t1(f1 float8);
+create function pg_temp.mysin(float8) returns float8 language plpgsql
+as 'begin return sin($1); end';
+select explain_filter('explain (verbose) select * from t1 where pg_temp.mysin(f1) < 0.5');
+ explain_filter
+------------------------------------------------------------
+ Seq Scan on pg_temp.t1 (cost=N.N..N.N rows=N width=N)
+ Output: f1
+ Filter: (pg_temp.mysin(t1.f1) < 'N.N'::double precision)
+(3 rows)
+
+-- Test compute_query_id
+set compute_query_id = on;
+select explain_filter('explain (verbose) select * from int8_tbl i8');
+ explain_filter
+----------------------------------------------------------------
+ Seq Scan on public.int8_tbl i8 (cost=N.N..N.N rows=N width=N)
+ Output: q1, q2
+ Query Identifier: N
+(3 rows)
+
+-- Test compute_query_id with utility statements containing plannable query
+select explain_filter('explain (verbose) declare test_cur cursor for select * from int8_tbl');
+ explain_filter
+-------------------------------------------------------------
+ Seq Scan on public.int8_tbl (cost=N.N..N.N rows=N width=N)
+ Output: q1, q2
+ Query Identifier: N
+(3 rows)
+
+select explain_filter('explain (verbose) create table test_ctas as select 1');
+ explain_filter
+----------------------------------------
+ Result (cost=N.N..N.N rows=N width=N)
+ Output: N
+ Query Identifier: N
+(3 rows)
+
+-- Test SERIALIZE option
+select explain_filter('explain (analyze,buffers off,serialize) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms output=NkB format=text
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze,serialize text,buffers,timing off) select * from int8_tbl i8');
+ explain_filter
+-----------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: output=NkB format=text
+ Execution Time: N.N ms
+(4 rows)
+
+select explain_filter('explain (analyze,serialize binary,buffers,timing) select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms output=NkB format=binary
+ Execution Time: N.N ms
+(4 rows)
+
+-- this tests an edge case where we have no data to return
+select explain_filter('explain (analyze,buffers off,serialize) create temp table explain_temp as select * from int8_tbl i8');
+ explain_filter
+-------------------------------------------------------------------------------------------------
+ Seq Scan on int8_tbl i8 (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Serialization: time=N.N ms output=NkB format=text
+ Execution Time: N.N ms
+(4 rows)
+
+-- Test tuplestore storage usage in Window aggregate (memory case)
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,10) a(n)');
+ explain_filter
+----------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+ Window: w1 AS ()
+ Storage: Memory Maximum Storage: NkB
+ -> Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(6 rows)
+
+-- Test tuplestore storage usage in Window aggregate (disk case)
+set work_mem to 64;
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2500) a(n)');
+ explain_filter
+----------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+ Window: w1 AS ()
+ Storage: Disk Maximum Storage: NkB
+ -> Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(6 rows)
+
+-- Test tuplestore storage usage in Window aggregate (memory and disk case, final result is disk)
+select explain_filter('explain (analyze,buffers off,costs off) select sum(n) over(partition by m) from (SELECT n < 3 as m, n from generate_series(1,2500) a(n))');
+ explain_filter
+----------------------------------------------------------------------------------------
+ WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
+ Window: w1 AS (PARTITION BY ((a.n < N)))
+ Storage: Disk Maximum Storage: NkB
+ -> Sort (actual time=N.N..N.N rows=N.N loops=N)
+ Sort Key: ((a.n < N))
+ Sort Method: external merge Disk: NkB
+ -> Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
+ Planning Time: N.N ms
+ Execution Time: N.N ms
+(9 rows)
+
+reset work_mem;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d16bc20865..0d7f4579b9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2636,6 +2636,7 @@ SSL
SSLExtensionInfoContext
SSL_CTX
STARTUPINFO
+StorageIOUsage
STRLEN
SV
SYNCHRONIZATION_BARRIER
base-commit: 914ea1c93c0e446a0cd174497fd6a22fd6071c5e
--
2.43.0