On Tue, Mar 25, 2025 at 11:58:21AM -0500, Sami Imseih wrote:
> "Since the queryid hash value is computed on the post-parse-analysis
> representation of the queries, the opposite is also possible: queries with
> identical texts might appear as separate entries, if they have different
> meanings as a result of factors such as different search_path settings."
> 
> I think this text could remain as is, because search_path still
> matters for things like functions, etc.

Yeah, I think that's OK as-is.  I'm open to more improvements,
including more tests for these function patterns.  It's one of these
areas where we should be able to tweak RangeTblFunction and apply a
custom function to its funcexpr, and please note that I have no idea
how complex it could become as this is a Node expression.  :D

Functions in a temporary schema is not something as common as temp
tables, I guess, so these matter less, but they would still be a cause
of bloat for monitoring in very specific workloads.

> 2/
> "For example, pg_stat_statements will consider two apparently-identical
> queries to be distinct, if they reference a table that was dropped and
> recreated between the executions of the two queries."
> 
> This is no longer true for relations, but is still true for functions. I think
> we should mention the caveats in a bit more detail as this change
> will have impact on the most common case. What about something
> like this?
> 
> "For example, pg_stat_statements will consider two apparently-identical
> queries to be distinct, if they reference a function that was dropped and
> recreated between the executions of the two queries.

That's a bit larger than functions, but we could remain a bit more
evasive, with "if they reference *for example* a function that was
dropped and recreated between the executions of the two queries".

Note that for DDLs, like CREATE TABLE, we also group entries with
identical relation names, so we are kind of in line with the patch,
not with the current docs.

> Conversely, if a table is dropped and recreated between the
> executions of queries, two apparently-identical queries may be
> considered the same. However, if the alias for a table is different
> for semantically similar queries, these queries will be considered distinct"

This addition sounds like an improvement here.

As this thread has proved, we had little coverage these cases in pgss,
so I've applied the tests as an independent change.  It is also useful
to track how things change in the commit history depending on how the
computation is tweaked.  I've also included your doc suggestions.  I
feel that we could do better here, but that's a common statement
anyway when it comes to the documentation.
--
Michael
From 8ea61bb0d7d6c4dbbb40dbaedb5e751027163cfe Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Wed, 26 Mar 2025 08:07:59 +0900
Subject: [PATCH v6] Add custom query jumble function for RangeTblEntry.eref

---
 src/include/nodes/parsenodes.h                | 11 +++++++---
 src/backend/nodes/queryjumblefuncs.c          | 19 ++++++++++++++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++++++--
 .../pg_stat_statements/expected/select.out    | 20 ++++++++-----------
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf2..a87f949b389e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry
 	 */
 	/* user-written alias clause, if any */
 	Alias	   *alias pg_node_attr(query_jumble_ignore);
-	/* expanded reference names */
-	Alias	   *eref pg_node_attr(query_jumble_ignore);
+
+	/*
+	 * Expanded reference names.  This uses a custom query jumble function so
+	 * as the table name is included in the computation, not its list of
+	 * columns.
+	 */
+	Alias	   *eref pg_node_attr(custom_query_jumble);
 
 	RTEKind		rtekind;		/* see above */
 
@@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry
 	 * tables to be invalidated if the underlying table is altered.
 	 */
 	/* OID of the relation */
-	Oid			relid;
+	Oid			relid pg_node_attr(query_jumble_ignore);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704ba..62d6cfb7ac15 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
+									  RangeTblEntry *rte,
+									  Alias *expr);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * Custom query jumble function for RangeTblEntry.eref.
+ */
+static void
+_jumbleRangeTblEntry_eref(JumbleState *jstate,
+						  RangeTblEntry *rte,
+						  Alias *expr)
+{
+	JUMBLE_FIELD(type);
+
+	/*
+	 * This includes only the table name, the list of column names is ignored.
+	 */
+	JUMBLE_STRING(aliasname);
+}
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index f4e384e95aea..679381f00607 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -675,8 +675,13 @@ calls | 2
    things, the internal object identifiers appearing in this representation.
    This has some counterintuitive implications.  For example,
    <filename>pg_stat_statements</filename> will consider two apparently-identical
-   queries to be distinct, if they reference a table that was dropped
-   and recreated between the executions of the two queries.
+   queries to be distinct, if they reference for example a function that was
+   dropped and recreated between the executions of the two queries.
+   Conversely, if a table is dropped and recreated between the
+   executions of queries, two apparently-identical queries may be
+   considered the same. However, if the alias for a table is different
+   for semantically similar queries, these queries will be considered
+   distinct.
    The hashing process is also sensitive to differences in
    machine architecture and other facets of the platform.
    Furthermore, it is not safe to assume that <structfield>queryid</structfield>
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 708c6b0e9c41..1eebc2898ab9 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -433,11 +433,10 @@ COMMIT;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     1 | SELECT * FROM temp_t
-     1 | SELECT * FROM temp_t
+     2 | SELECT * FROM temp_t
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(4 rows)
+(3 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
@@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1
-     9 | SELECT a FROM tab_search_diff_2 AS t1
-     1 | SELECT a, b FROM pgss_schema_1.tab_search_same
-     3 | SELECT a, b FROM tab_search_same
+     8 | SELECT a FROM tab_search_diff_2
+     4 | SELECT a FROM tab_search_diff_2 AS t1
+     4 | SELECT a, b FROM tab_search_same
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
-     1 | SELECT count(*) FROM pgss_schema_1.tab_search_same
-     1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1
-     3 | SELECT count(*) FROM tab_search_diff_1
+     4 | SELECT count(*) FROM tab_search_diff_1
      4 | SELECT count(*) FROM tab_search_diff_2
-     3 | SELECT count(*) FROM tab_search_same
+     4 | SELECT count(*) FROM tab_search_same
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(11 rows)
+(8 rows)
 
 DROP SCHEMA pgss_schema_1 CASCADE;
 NOTICE:  drop cascades to 3 other objects
-- 
2.49.0

Attachment: signature.asc
Description: PGP signature

Reply via email to