On Wed, Oct 2, 2024 at 6:39 AM Michael Paquier <mich...@paquier.xyz> wrote: > FWIW, I've always found this case with EXPLAIN with two entries > confusing, so what's the advantage in trying to apply this rule for > the rest? We know that EXPLAIN, DECLARE and CTAS run a query attached > to their DDL, hence isn't it sufficient to register a single entry for > the top-level query, then nothing for the internal one. The > documentation tells about inner queries when pg_stat_statements.track > = all, like the ones in PL functions, DO blocks, because we have a > clear view of the query string, creating a unique one-one mapping > between a query string and its ID. This patch maps the same query > string to more than one query ID, spreading that. > > So it seems that there are arguments for not doing what this patch > proposes, but also make sure that EXPLAIN logs a single entry, not > two currently when using pg_stat_statements.track = all.
I agree that tracking 2 identical statements with different queryIds and nesting levels is very confusing. On the other hand, from an extension developer point of view (not necessarily limited to pg_stat_statements), I would like to have the queryId available and the post_parse hook called so the query can be normalised and tracked in a hashmap. However, the repeated statements did bug me a lot so I took a stab at trying to find a possible solution. I made an attempt in 0001 by tracking the statements' locations of explainable statements (Select, Insert, Update, Merge, Delete...) during parse and propagate them in the generated Query during transform. With the change, we now have the following result: SET pg_stat_statements.track = 'all'; explain (costs off) select 1; select 1; select queryid, calls, query, toplevel from pg_stat_statements where query ~'select \$1'; queryid | calls | query | toplevel ---------------------+-------+-------------------------------+---------- 2800308901962295548 | 1 | select $1 | t 2800308901962295548 | 1 | select $1; | f 3797185112479763582 | 1 | explain (costs off) select $1 | t The top level and nested select statement now share both the same queryId and query string. The additional ';' for the nested query is due to not having the statement length and taking the whole statement. > Side note. It looks like the patch is forgetting about CREATE VIEW > and CREATE MATERIALIZED VIEW, creating only a top-level entry when > running these utilities. I've updated 0002 to handle CREATE MATERIALIZED VIEW, CREATE VIEW doesn't generate nested statements. I've also realised that refresh materialized view has a similar problem to explain. The first refresh called when the matview is created will have the correct select substring. Subsequent refresh call will use the refresh utility statement as the query string: -- Create the view CREATE MATERIALIZED VIEW test AS SELECT * FROM generate_series(1, 1000) as id; -- Reset pgss and refresh select * from pg_stat_statements_reset(); REFRESH MATERIALIZED VIEW test; select queryid, calls, query, toplevel from pg_stat_statements; queryid | calls | query | toplevel ----------------------+-------+------------------------------------------+---------- 8227191975572355654 | 1 | REFRESH MATERIALIZED VIEW test | t -2908407163726309935 | 1 | REFRESH MATERIALIZED VIEW test; | f -1361326859153559975 | 1 | select * from pg_stat_statements_reset() | t I've tried to improve this behaviour in 0003 where the view's definition is used as query string instead of the refresh utility statement.
v5-0001-Track-location-to-extract-relevant-part-in-nested.patch
Description: Binary data
v5-0003-Use-view-s-definition-as-query-string-on-a-materi.patch
Description: Binary data
v5-0002-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data