Re: Proposal: Progressive explain

Rafael Thofehrn Castro Tue, 18 Feb 2025 10:42:39 -0800

Hello all,

Sending a new version of the patch that includes important changes
addressing
feedback provided by Greg and Tomas. So, including the previous version (v5)
sent on Jan 29, these are the highlights of what has changed:


- Progressive plan printed on regular interval defined by
progressive_explain_timeout
now uses timeouts. GUC progressive_explain_sampe_rate is removed.

- Objects allocated per plan print (Instrument and ExplainState) were
replaced
by reusable objects allocated at query start (during progressive explain
setup
phase). So currently we are allocating only 2 objects for the complete
duration
of the feature. With that, removed the temporary memory context that was
being
allocated per iteration.

- progressive_explain GUC was changed from boolean to enum, accepting
values 'off',
'explain' and 'analyze'. This allows using instrumented progressive explains
for any query and not only the ones started via EXPLAIN ANALYZE. If GUC is
set to 'explain' the plan will be printed only once at query start. If set
to 'analyze' instrumentation will be enabled in QueryDesc and the detailed
plan will be printed iteratively. Considering that now we can enable
instrumentation
for regular queries, added the following GUCs to control what instruments
are enabled: progressive_explain_buffers, progressive_explain_timing and
progressive_explain_wals.

- better handling of shared memory space where plans are printed and shared
with other backends. In previous version we had a shared hash with elements
holding all data related to progressive explains, including the complete
plan string:

typedef struct explainHashEntry
{
explainHashKey key; /* hash key of entry - MUST BE FIRST */
int pid;
TimestampTz last_explain;
int explain_count;
float explain_duration;
char plan[];
} explainHashEntry;

The allocated size per element used to be defined by
progressive_explain_output_size,
which would essentially control the space available for plan[].

Greg raised the concern of PG having to allocate too much shared memory
at database start considering that we need enough space for max_connections
+
max_parallel_workers, and that is a totally valid point.

So the new version takes advantage of DSAs. Each backend creates its own
DSA at query start (if progressive explain is enabled) where the shared
data is stored. That DSA is shared with other backends via hash structure
through dsa_handle and dsa_pointer pointers:

typedef struct progressiveExplainHashEntry
{
progressiveExplainHashKey key; /* hash key of entry - MUST BE FIRST */
dsa_handle h;
dsa_pointer p;
} progressiveExplainHashEntry;

typedef struct progressiveExplainData
{
int pid;
TimestampTz last_print;
char plan[];
} progressiveExplainData;

That allows us to allocate areas of custom sizes for plan[]. The strategy
being used currently is to allocate an initial space with the size of the
initial plan output + PROGRESSIVE_EXPLAIN_ALLOC_SIZE (4096 currently), which
gives PG enough room for subsequent iterations where the new string may
increase a bit, without having to reallocate space. The code checks sizes
and
will reallocate if needed. With that, GUC progressive_explain_output_size
was removed.

- Adjusted columns of pg_stat_progress_explain. Columns explain_count and
total_explain_time were removed. Column last_explain was renamed to
last_print.
Column explain was renamed to query_plan, as this is the name used by PG
when a plan is printed with EXPLAIN.

Rafael.

v5-0001-Proposal-for-progressive-explains.patch
Description: Binary data

Re: Proposal: Progressive explain

Reply via email to