Hi, On 2025-08-31 16:57:01 -0700, Lukas Fittl wrote: > Please find attached a patch series that introduces a new paradigm for how > per-node WAL/buffer usage is tracked, with two primary goals: (1) reduce > overhead of EXPLAIN ANALYZE, (2) enable future work like tracking estimated > distinct buffer hits [0].
I like this for a third reason: To separate out buffer access statistics for the index and the table in index scans. Right now it's very hard to figure out if a query is slow because of the index lookups or finding the tuples in the table. > 0001: Separate node instrumentation from other use of Instrumentation struct > > Previously different places (e.g. query "total time") were repurposing > the per-node Instrumentation struct. Instead, simplify the Instrumentation > struct to only track time, WAL/buffer usage, and tuple counts. Similarly, > drop the use of InstrEndLoop outside of per-node instrumentation. Introduce > the NodeInstrumentation struct to carry forward the per-node > instrumentation information. It's mildly odd that the two types of instrumentation have a different definition of 'total' (one double, one instr_time). > 0003: Introduce stack for tracking per-node WAL/buffer usage > From 4375fcb4141f18d6cd927659970518553aa3fe94 Mon Sep 17 00:00:00 2001 > From: Lukas Fittl <lu...@fittl.com> > Date: Sun, 31 Aug 2025 16:37:05 -0700 > Subject: [PATCH v1 3/3] Introduce stack for tracking per-node WAL/buffer usage > diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c > index b83ced9a57a..1c2268bc608 100644 > --- a/src/backend/executor/execMain.c > +++ b/src/backend/executor/execMain.c > @@ -312,6 +312,7 @@ standard_ExecutorRun(QueryDesc *queryDesc, > DestReceiver *dest; > bool sendTuples; > MemoryContext oldcontext; > + InstrStackResource *res; > > /* sanity checks */ > Assert(queryDesc != NULL); > @@ -333,6 +334,9 @@ standard_ExecutorRun(QueryDesc *queryDesc, > if (queryDesc->totaltime) > InstrStart(queryDesc->totaltime); > > + /* Start up per-query node level instrumentation */ > + res = InstrStartQuery(); > + > /* > * extract information from the query descriptor and the query feature. > */ > @@ -382,6 +386,9 @@ standard_ExecutorRun(QueryDesc *queryDesc, > if (sendTuples) > dest->rShutdown(dest); > > + /* Shut down per-query node level instrumentation */ > + InstrShutdownQuery(res); > + > if (queryDesc->totaltime) > InstrStop(queryDesc->totaltime, estate->es_processed); Why are we doing Instr{Start,Stop}Query when not using instrumentation? Resowner stuff ain't free, so I'd skip them when not necessary. > diff --git a/src/backend/executor/execProcnode.c > b/src/backend/executor/execProcnode.c > index d286471254b..7436f307994 100644 > --- a/src/backend/executor/execProcnode.c > +++ b/src/backend/executor/execProcnode.c > @@ -823,8 +823,17 @@ ExecShutdownNode_walker(PlanState *node, void *context) > > /* Stop the node if we started it above, reporting 0 tuples. */ > if (node->instrument && node->instrument->running) > + { > InstrStopNode(node->instrument, 0); > > + /* > + * Propagate WAL/buffer stats to the parent node on the > + * instrumentation stack (which is where InstrStopNode returned > us > + * to). > + */ > + InstrNodeAddToCurrent(&node->instrument->stack); > + } > + > return false; > } Can we rely on this being reached? Note that ExecutePlan() calls ExecShutdownNode() conditionally: /* * If we know we won't need to back up, we can release resources at this * point. */ if (!(estate->es_top_eflags & EXEC_FLAG_BACKWARD)) ExecShutdownNode(planstate); > +static void > +ResOwnerReleaseInstrStack(Datum res) > +{ > + /* > + * XXX: Registered resources are *not* called in reverse order, i.e. > we'll > + * get what was first registered first at shutdown. To avoid handling > + * that, we are resetting the stack here on abort (instead of recovering > + * to previous). > + */ > + pgInstrStack = NULL; > +} Hm, doesn't that mean we loose track of instrumentation if you e.g. do an EXPLAIN ANALYZE of a query that executes a function, which internally triggers an error and catches it? I wonder if the solution could be to walk the stack and search for the to-be-released element. Greetings, Andres Freund