Hello all, Sending a new version of the patch that includes important changes addressing feedback provided by Greg and Tomas. So, including the previous version (v5) sent on Jan 29, these are the highlights of what has changed:
- Progressive plan printed on regular interval defined by progressive_explain_timeout now uses timeouts. GUC progressive_explain_sampe_rate is removed. - Objects allocated per plan print (Instrument and ExplainState) were replaced by reusable objects allocated at query start (during progressive explain setup phase). So currently we are allocating only 2 objects for the complete duration of the feature. With that, removed the temporary memory context that was being allocated per iteration. - progressive_explain GUC was changed from boolean to enum, accepting values 'off', 'explain' and 'analyze'. This allows using instrumented progressive explains for any query and not only the ones started via EXPLAIN ANALYZE. If GUC is set to 'explain' the plan will be printed only once at query start. If set to 'analyze' instrumentation will be enabled in QueryDesc and the detailed plan will be printed iteratively. Considering that now we can enable instrumentation for regular queries, added the following GUCs to control what instruments are enabled: progressive_explain_buffers, progressive_explain_timing and progressive_explain_wals. - better handling of shared memory space where plans are printed and shared with other backends. In previous version we had a shared hash with elements holding all data related to progressive explains, including the complete plan string: typedef struct explainHashEntry { explainHashKey key; /* hash key of entry - MUST BE FIRST */ int pid; TimestampTz last_explain; int explain_count; float explain_duration; char plan[]; } explainHashEntry; The allocated size per element used to be defined by progressive_explain_output_size, which would essentially control the space available for plan[]. Greg raised the concern of PG having to allocate too much shared memory at database start considering that we need enough space for max_connections + max_parallel_workers, and that is a totally valid point. So the new version takes advantage of DSAs. Each backend creates its own DSA at query start (if progressive explain is enabled) where the shared data is stored. That DSA is shared with other backends via hash structure through dsa_handle and dsa_pointer pointers: typedef struct progressiveExplainHashEntry { progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */ dsa_handle h; dsa_pointer p; } progressiveExplainHashEntry; typedef struct progressiveExplainData { int pid; TimestampTz last_print; char plan[]; } progressiveExplainData; That allows us to allocate areas of custom sizes for plan[]. The strategy being used currently is to allocate an initial space with the size of the initial plan output + PROGRESSIVE_EXPLAIN_ALLOC_SIZE (4096 currently), which gives PG enough room for subsequent iterations where the new string may increase a bit, without having to reallocate space. The code checks sizes and will reallocate if needed. With that, GUC progressive_explain_output_size was removed. - Adjusted columns of pg_stat_progress_explain. Columns explain_count and total_explain_time were removed. Column last_explain was renamed to last_print. Column explain was renamed to query_plan, as this is the name used by PG when a plan is printed with EXPLAIN. Rafael.
v5-0001-Proposal-for-progressive-explains.patch
Description: Binary data