Re: Set query_id for query contained in utility statement

Anthonin Bonnefoy Fri, 04 Oct 2024 02:06:09 -0700

On Wed, Oct 2, 2024 at 6:39 AM Michael Paquier <mich...@paquier.xyz> wrote:
> FWIW, I've always found this case with EXPLAIN with two entries
> confusing, so what's the advantage in trying to apply this rule for
> the rest?  We know that EXPLAIN, DECLARE and CTAS run a query attached
> to their DDL, hence isn't it sufficient to register a single entry for
> the top-level query, then nothing for the internal one.  The
> documentation tells about inner queries when pg_stat_statements.track
> = all, like the ones in PL functions, DO blocks, because we have a
> clear view of the query string, creating a unique one-one mapping
> between a query string and its ID.  This patch maps the same query
> string to more than one query ID, spreading that.
>
> So it seems that there are arguments for not doing what this patch
> proposes, but also make sure that EXPLAIN logs a single entry, not
> two currently when using pg_stat_statements.track = all.


I agree that tracking 2 identical statements with different queryIds
and nesting levels is very confusing. On the other hand, from an
extension developer point of view (not necessarily limited to
pg_stat_statements), I would like to have the queryId available and
the post_parse hook called so the query can be normalised and tracked
in a hashmap.

However, the repeated statements did bug me a lot so I took a stab at
trying to find a possible solution. I made an attempt in 0001 by
tracking the statements' locations of explainable statements (Select,
Insert, Update, Merge, Delete...) during parse and propagate them in
the generated Query during transform. With the change, we now have the
following result:

SET pg_stat_statements.track = 'all';
explain (costs off) select 1;
select 1;
select queryid, calls, query, toplevel from pg_stat_statements
     where query ~'select \$1';
       queryid       | calls |             query             | toplevel
---------------------+-------+-------------------------------+----------
 2800308901962295548 |     1 | select $1                     | t
 2800308901962295548 |     1 | select $1;                    | f
 3797185112479763582 |     1 | explain (costs off) select $1 | t

The top level and nested select statement now share both the same
queryId and query string. The additional ';' for the nested query is
due to not having the statement length  and taking the whole
statement.

> Side note.  It looks like the patch is forgetting about CREATE VIEW
> and CREATE MATERIALIZED VIEW, creating only a top-level entry when
> running these utilities.

I've updated 0002 to handle CREATE MATERIALIZED VIEW, CREATE VIEW
doesn't generate nested statements. I've also realised that refresh
materialized view has a similar problem to explain. The first refresh
called when the matview is created will have the correct select
substring. Subsequent refresh call will use the refresh utility
statement as the query string:

-- Create the view
CREATE MATERIALIZED VIEW test AS SELECT * FROM generate_series(1, 1000) as id;
-- Reset pgss and refresh
select * from pg_stat_statements_reset();
REFRESH MATERIALIZED VIEW test;
select queryid, calls, query, toplevel from pg_stat_statements;
       queryid        | calls |                  query
  | toplevel
----------------------+-------+------------------------------------------+----------
  8227191975572355654 |     1 | REFRESH MATERIALIZED VIEW test           | t
 -2908407163726309935 |     1 | REFRESH MATERIALIZED VIEW test;          | f
 -1361326859153559975 |     1 | select * from pg_stat_statements_reset() | t

I've tried to improve this behaviour in 0003 where the view's
definition is used as query string instead of the refresh utility
statement.

v5-0001-Track-location-to-extract-relevant-part-in-nested.patch
Description: Binary data

v5-0003-Use-view-s-definition-as-query-string-on-a-materi.patch
Description: Binary data

v5-0002-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data

Re: Set query_id for query contained in utility statement

Reply via email to