Thanks Greg, Sami and Jian for the feedback so far. > Maybe track_explain instead? In the spirit of track_activity.
That was the original name, and all other GUCs were following the track_activity_* logic. Changed to the name of the feature after discussion with colleagues at EDB. This is definitely open for discussion. > 4096 seems low, if this means the explain plan is truncated at > that size. Also, the 100 minimum seems arbitrary. Min (100) and max (1048576) are the same as the values for GUC track_activity_query_size, which has a very similar purpose: controls the size of pg_stat_activity.query column. > So we can enable verbose and settings - but not wal? I could see > that one being useful. Not so much the rest (timing, summary). And > buffers has recently changed, so no need to worry about that. :) The logic I used for adding GUCs that control explain options is that none of these settings should change QueryDesc->instrument_options, which would change instrumentation options added to the actual execution. GUCs available modify only the ExplainState object, which affects only the output printed to pg_stat_progress_explain. > Hmmm...don't have a solution/suggestion offhand, but using max_connections > would seem to be allocating a chunk of memory that is never > used 99% of the time, as most people don't run active queries > near max_connections. > (Actually, on re-reading my draft, I would prefer a rotating pool > like pg_stat_statements does.) Agreed. Thought about using a similar logic as pg_stat_statements but the pool size there is very large by default, 5000. The difference is that pg_stat_statements keeps the data in disk and I wanted to avoid that as instrumented plans can print new plans very often, affecting performance. Maybe one idea would be to include a new GUC (progressive_explain_max_size) that controls how many rows explainArray can have. If limit is reached a backend won't print anything in that iteration. > It's not clear if total_explain_time is now() - query_start or something > else. If not, I would love to see an elapsed time interval column. total_explain_time is accumulated time computed only printing the plan. It does not include execution time. > Perhaps add a leader_pid column. That's something I would always > be joining with pg_stat_activity to find out. For prints done by parallel workers? That information is available in pg_stat_activity. The idea is to use the pid column and join with pg_stat_activity to get all other relevant details. > I do not think however this instrumentation should only be > made available if a user runs EXPLAIN ANALYZE. > In my opinion, this will severely limit the usefulness of this > instrumentation in production. Of course, one can use auto_explain, > but users will be hesitant to enable auto_explain with analyze in > production for all their workloads. Also, there should not be an > auto_explain dependency for this feature. > One approach will be for the view to expose the > explain plan and the current node being executed. I think the > plan_node_id can be exposed for this purpose but have not looked > into this in much detail yet. The plan_node_id can then be used > to locate the part of the plan that is a potential bottleneck ( if that > plan node is the one constantly being called ). You mean that we could include the current node being executed even for non instrumented runs? In that case it would print the plain plan + current node? That is a valid point and shouldn't be difficult to implement. The problem is that this would require adding overhead to ExecProcNode() (non instrumented) and that can be a performance killer. > This may also be work that is better suited for an extension, but > core will need to add a hook in ExecProcNode so an extension can > have access to PlanState. Are you talking about implementing your proposal (also printing plan with current node for non instrumented runs) as an extension or this whole patch as an extension? If the whole patch, I thought about that. The thing is that the proposal also changes ExplainNode() function, the core function to print a plan. To implement it as an extension we would have to duplicate 95% of that code. I do think there is merit in having this feature as part of the core and use existing extensions (auto_explain for example) to increment it, like adding your suggestion to use a hook in ExecProcNode(). > compile failed. the above is the error message. Thanks. It was indeed missing an include. It complained only for a complete build (including contrib), so I failed to catch it. Sending a second version with the fix. Rafael. On Tue, Dec 31, 2024 at 3:00 AM jian he <jian.universal...@gmail.com> wrote: > hi. > > [48/208] Compiling C object > contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o > FAILED: contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o > /usr/local/gcc-14.1.0/bin/gcc-14.1.0 > -Icontrib/postgres_fdw/postgres_fdw.so.p -Isrc/include > -I../../Desktop/pg_src/src5/postgres/src/include > -Isrc/interfaces/libpq > -I../../Desktop/pg_src/src5/postgres/src/interfaces/libpq > -I/usr/include/libxml2 -fdiagnostics-color=always --coverage > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g > -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE > -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels > -Wmissing-format-attribute -Wimplicit-fallthrough=3 > -Wcast-function-type -Wshadow=compatible-local -Wformat-security > -Wdeclaration-after-statement -Wmissing-variable-declarations > -Wno-format-truncation -Wno-stringop-truncation -Wunused-variable > -Wuninitialized -Werror=maybe-uninitialized -Wreturn-type > -DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES > -DREALLOCATE_BITMAPSETS -DLOCK_DEBUG -DRELCACHE_FORCE_RELEASE > -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS > -DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer -fPIC -pthread > -fvisibility=hidden -MD -MQ > contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -MF > contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o.d -o > contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -c > ../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c > In file included from > ../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c:22: > ../../Desktop/pg_src/src5/postgres/src/include/commands/explain.h:86:9: > error: unknown type name ‘TimestampTz’ > 86 | TimestampTz last_explain; > | ^~~~~~~~~~~ > [58/188] Linking target contrib/sslinfo/sslinfo.so > ninja: build stopped: subcommand failed. > > compile failed. the above is the error message. >
v2-0001-Proposal-for-progressive-explains.patch
Description: Binary data