Re: Add parallel columns for pg_stat_statements

Michael Paquier Sun, 06 Oct 2024 17:19:00 -0700

On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote:
> I'm not sure I follow. That would mean that every time a query is executed,
> it always gets the same amount of workers. Which is not guaranteed to be
> true.
>
> I would agree, though, that parallelized_queries_launched is probably not
> that interesting. I could get rid of it if you think it should go away.


My point is that these stats are useful to know which action may have
to be taken when reaching a mean, and numbers in pg_stat_statements
offer hints that something is going wrong and that a closer lookup at
an EXPLAIN plan may be required, particularly if the total number of
workers planned and launched aggregated in the counters is unbalanced
across queries.  If the planned/launched ratio is balanced across most
queries queries, a GUC adjustment may be OK.  If the ratio is very
unbalanced in a lower set of queries, I'd also look at tweaking GUCs
instead like the per_gather.  These counters give information that one
or the other may be required.

> Well, I don't see this as an overlap. Rather more information.

Later versions of Benoit's patch have been accumulating this data in
the executor state.  v4 posted at [1] has the following diffs:
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -724,6 +724,9 @@ typedef struct EState
         */
        List       *es_insert_pending_result_relations;
        List       *es_insert_pending_modifytables;
+
+       int                     es_workers_launched;
+       int                     es_workers_planned;
 } EState;

Your v2 posted on this thread has that:
@@ -707,6 +707,12 @@ typedef struct EState
        struct EPQState *es_epq_active;
 
        bool            es_use_parallel_mode;   /* can we use parallel workers? 
*/
+       bool            es_used_parallel_mode;  /* was executed in parallel */
+       int                     es_parallelized_workers_launched;
+       int                     es_parallelized_workers_planned;
+       int                     es_parallelized_nodes; /* # of parallelized 
nodes */
+       int                     es_parallelized_nodes_all_workers; /* # of 
nodes with all workers launched */
+       int                     es_parallelized_nodes_no_worker; /* # of nodes 
with no workers launched */
 
es_parallelized_workers_launched and es_workers_launched are the same
thing in both.

> On this, I would agree with you. They are not that particularly useful to
> get better setting for parallel GUCs. I can drop them if you want.

Yep.  I would remove them for now.  This leads to more bloat.

> Did this on the v2 version of the patch (attached here).
> 
> Thanks for your review. If you want the parallelized_queries_launched
> column and the parallelized_nodes_* columns dropped, I can do that on a v3
> patch.

I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
want to coordinate with Benoit to avoid duplicating the work). 
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter. 
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.

[1]: 
https://www.postgresql.org/message-id/[email protected]
[2]: https://www.postgresql.org/message-id/[email protected]
--
Michael

signature.asc
Description: PGP signature

Re: Add parallel columns for pg_stat_statements

Reply via email to