On Mon, Sep 02, 2024 at 10:11:43AM +0900, Michael Paquier wrote: > I need to spend a bit more time with my head down for this thread, but > couldn't we use these commands with various query patterns in > pg_stat_statements and look at the shmem counters reported through its > view?
My apologies for the time it took, but here you go with a patch set. I have looked at this thread overall, and there are two problems at hand regarding the lack of reporting of the query ID in backend entries for the extended query protocol: 1) ExecutorRun() misses the reports, which happens when a query does an ExecutorStart(), then a series of ExecutorRun() through a portal with bind messages. Robert has mentioned that separately a few days ago at [1]. But that's not everything. 2) A query executed through a portal with tuples to return in a tuplestore also miss the query ID report. For example, a DML RETURNING with the extended protocol would use an execute (with ExecutorStart and ExecutorRun) followed by a series of execute fetch. pg_stat_activity would report the query ID for the execute, not for the fetches, while pg_stat_activity has the query string. That's confusing. The patch series attached address these two in 0001 and 0003. 0001 should be backpatched (still need to wordsmith the comments), where I've come down to the approach of using a report in ExecutorRun() because it is simpler and it does the job. Perhaps also 0003, but nobody has complained about that, either. I have also looked at the tests proposed (isolation, TAP, custom module); all of them are a bit disappointing because they duplicate some patterns that are already tested in pg_stat_statements, while willing to check the contents of pg_stat_statements. I am afraid that it is not going to age well because we'd need to have the same query patterns in more than one place. We should have tests, definitely, but we can do an equivalent of pg_stat_activity lookups by calling pgstat_get_my_query_id() in strategic places, making sure that all dedicated paths always have the query ID reported: - Check pgstat_get_my_query_id() in the run, finish and end executor hooks. - In the parse-analyze hook, before the query ID is reported (except for a PREPARE), check that the ID in a Query is set. The test proposed by Robert on the other thread was fancy enough that I've added it. All that is in 0002, and that's enough to cause 0001 to fail, planning only these on HEAD. Tests in 0003 require fetch messages, and I don't have a trick in my sleeves except if we invent a new meta-command in psql. There are other problems mentioned on this thread, with plan caching for example. Let's deal with that separately, in separate threads. [1]: https://www.postgresql.org/message-id/CA+TgmoZxtnf_jZ=vqbsyau8hfukkwojcj6ufy4lgpxaunkr...@mail.gmail.com -- Michael
From d4d469da18d6442fed5a6fd5824f827e28cf5bb3 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Wed, 11 Sep 2024 14:12:24 +0900 Subject: [PATCH 1/3] Fix reporting of query ID with extended query protocol A query run through the extended query protocol would miss reporting its query ID to pg_stat_statements, as the value is reset when beginning the processing of a new execute message. ExecutorStart() was calling pgstat_report_query_id(), but it missed the fact that multiple ExecutorRun() calls could be issued for a single query with an initial ExecutorStart() across multiple execute messages, hence the query ID would be missing in the second execute. Backpatch-through: 14 --- src/backend/executor/execMain.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 29e186fa73..94e18f2e36 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -119,10 +119,11 @@ void ExecutorStart(QueryDesc *queryDesc, int eflags) { /* - * In some cases (e.g. an EXECUTE statement) a query execution will skip - * parse analysis, which means that the query_id won't be reported. Note - * that it's harmless to report the query_id multiple times, as the call - * will be ignored if the top level query_id has already been reported. + * In some cases (e.g. an EXECUTE statement or an execute message with + * the extended query protocol) the query_id won't be reported, so do it + * now. Note that it's harmless to report the query_id multiple times, + * as the call will be ignored if the top level query_id has already been + * reported. */ pgstat_report_query_id(queryDesc->plannedstmt->queryId, false); @@ -293,6 +294,17 @@ ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count, bool execute_once) { + /* + * When executing a query with the extended query protocol, + * ExecutorStart() may not be called, causing the query ID to not be + * reported. Hence, do it again here in case it was missed. + * + * Reporting multiple times the query ID is harmless. + * + * See also comments in ExecutorStart(). + */ + pgstat_report_query_id(queryDesc->plannedstmt->queryId, false); + if (ExecutorRun_hook) (*ExecutorRun_hook) (queryDesc, direction, count, execute_once); else -- 2.45.2
From 8408f9929af5cf35bf2dc888777782e476160c4b Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Wed, 11 Sep 2024 14:15:03 +0900 Subject: [PATCH 2/3] Add sanity checks related to query ID reporting in pg_stat_statements This includes as well tests with the extended query protocol, checking that the query ID is correctly set when going through the executor hooks of the module. These are new tests, so no backpatch is done. --- contrib/pg_stat_statements/Makefile | 2 +- .../pg_stat_statements/expected/extended.out | 62 +++++++++++++++++++ contrib/pg_stat_statements/meson.build | 1 + .../pg_stat_statements/pg_stat_statements.c | 17 +++++ contrib/pg_stat_statements/sql/extended.sql | 17 +++++ 5 files changed, 98 insertions(+), 1 deletion(-) create mode 100644 contrib/pg_stat_statements/expected/extended.out create mode 100644 contrib/pg_stat_statements/sql/extended.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index c19ccad77e..1622b43ded 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -19,7 +19,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS)) REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf REGRESS = select dml cursors utility level_tracking planning \ - user_activity wal entry_timestamp privileges cleanup \ + user_activity wal entry_timestamp privileges extended cleanup \ oldextversions # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", # which typical installcheck users do not have (e.g. buildfarm clients). diff --git a/contrib/pg_stat_statements/expected/extended.out b/contrib/pg_stat_statements/expected/extended.out new file mode 100644 index 0000000000..09c1f895e5 --- /dev/null +++ b/contrib/pg_stat_statements/expected/extended.out @@ -0,0 +1,62 @@ +-- Tests for extended query protocol +SELECT query_id IS NOT NULL AS query_id_set + FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g + query_id_set +-------------- + t +(1 row) + +SELECT $1 \parse stmt1 +SELECT $1, $2 \parse stmt2 +SELECT $1, $2, $3 \parse stmt3 +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +\bind_named stmt1 'stmt1_val1' \g + ?column? +------------ + stmt1_val1 +(1 row) + +\bind_named stmt2 'stmt2_val1' 'stmt2_val2' \g + ?column? | ?column? +------------+------------ + stmt2_val1 | stmt2_val2 +(1 row) + +\bind_named stmt3 'stmt3_val1' 'stmt3_val2' 'stmt3_val3' \g + ?column? | ?column? | ?column? +------------+------------+------------ + stmt3_val1 | stmt3_val2 | stmt3_val3 +(1 row) + +\bind_named stmt3 'stmt3_val4' 'stmt3_val5' 'stmt3_val6' \g + ?column? | ?column? | ?column? +------------+------------+------------ + stmt3_val4 | stmt3_val5 | stmt3_val6 +(1 row) + +\bind_named stmt2 'stmt2_val3' 'stmt2_val4' \g + ?column? | ?column? +------------+------------ + stmt2_val3 | stmt2_val4 +(1 row) + +\bind_named stmt1 'stmt1_val1' \g + ?column? +------------ + stmt1_val1 +(1 row) + +SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; + calls | rows | query +-------+------+---------------------------------------------------- + 2 | 2 | SELECT $1 + 2 | 2 | SELECT $1, $2 + 2 | 2 | SELECT $1, $2, $3 + 1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t +(4 rows) + diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build index 5cf926d1f8..e14669ca15 100644 --- a/contrib/pg_stat_statements/meson.build +++ b/contrib/pg_stat_statements/meson.build @@ -51,6 +51,7 @@ tests += { 'wal', 'entry_timestamp', 'privileges', + 'extended', 'cleanup', 'oldextversions', ], diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 362d222f63..1ae31b82a8 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -834,6 +834,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) if (!pgss || !pgss_hash || !pgss_enabled(nesting_level)) return; + /* + * Query ID should be set in the query. + * + * Note that the query ID may be already reported (like in a PREPARE + * statement, hence there is no check based on pgstat_get_my_query_id). + */ + Assert(query->queryId != 0); + /* * If it's EXECUTE, clear the queryId so that stats will accumulate for * the underlying PREPARE. But don't do this if we're not tracking @@ -1022,6 +1030,9 @@ static void pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count, bool execute_once) { + /* Query ID should be reported. */ + Assert(pgstat_get_my_query_id() != 0); + nesting_level++; PG_TRY(); { @@ -1043,6 +1054,9 @@ pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count, static void pgss_ExecutorFinish(QueryDesc *queryDesc) { + /* Query ID should be reported. */ + Assert(pgstat_get_my_query_id() != 0); + nesting_level++; PG_TRY(); { @@ -1066,6 +1080,9 @@ pgss_ExecutorEnd(QueryDesc *queryDesc) { uint64 queryId = queryDesc->plannedstmt->queryId; + /* Query ID should be reported. */ + Assert(pgstat_get_my_query_id() != 0); + if (queryId != UINT64CONST(0) && queryDesc->totaltime && pgss_enabled(nesting_level)) { diff --git a/contrib/pg_stat_statements/sql/extended.sql b/contrib/pg_stat_statements/sql/extended.sql new file mode 100644 index 0000000000..c039c5de01 --- /dev/null +++ b/contrib/pg_stat_statements/sql/extended.sql @@ -0,0 +1,17 @@ +-- Tests for extended query protocol + +SELECT query_id IS NOT NULL AS query_id_set + FROM pg_stat_activity WHERE pid = pg_backend_pid() \bind \g + +SELECT $1 \parse stmt1 +SELECT $1, $2 \parse stmt2 +SELECT $1, $2, $3 \parse stmt3 +SELECT pg_stat_statements_reset() IS NOT NULL AS t; +\bind_named stmt1 'stmt1_val1' \g +\bind_named stmt2 'stmt2_val1' 'stmt2_val2' \g +\bind_named stmt3 'stmt3_val1' 'stmt3_val2' 'stmt3_val3' \g +\bind_named stmt3 'stmt3_val4' 'stmt3_val5' 'stmt3_val6' \g +\bind_named stmt2 'stmt2_val3' 'stmt2_val4' \g +\bind_named stmt1 'stmt1_val1' \g + +SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; -- 2.45.2
From 256ce554246cc459dc4efdab69a00d3503fca45e Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Wed, 11 Sep 2024 14:16:24 +0900 Subject: [PATCH 3/3] Report query ID for execute fetch in extended query protocol This concerns for example queries that execute a query through a portal, then do *not* require going through an execution again because all the tuples to return are stored in a tuplestore part of a portal. For example, this would happen for a RETURNING query with a fetch_size small enough to require multiple fetch batch with a message sequence like: <execute> <fetch> <...> <fetch> <commit> The query ID would be set in the initial execute message, but not in any of the follow up execute fetch messages. Perhaps backpatch to 14? This would report new information in pg_stat_activity, and nobody has complained about that, either. --- src/backend/tcop/postgres.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 8bc6bea113..0f549468d2 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2185,9 +2185,28 @@ exec_execute_message(const char *portal_name, long max_rows) * then we are only fetching more rows rather than completely re-executing * the query from the start. atStart is never reset for a v3 portal, so we * are safe to use this check. + * + * The query ID is lost in this case as of pgstat_report_activity() done + * above, so set it again when only fetching rows. */ execute_is_fetch = !portal->atStart; + if (execute_is_fetch) + { + ListCell *lc; + + foreach(lc, portal->stmts) + { + PlannedStmt *stmt = lfirst_node(PlannedStmt, lc); + + if (stmt->queryId != UINT64CONST(0)) + { + pgstat_report_query_id(stmt->queryId, false); + break; + } + } + } + /* Log immediately if dictated by log_statement */ if (check_log_statement(portal->stmts)) { -- 2.45.2
signature.asc
Description: PGP signature