On 19.02.2025 22:11, Sami Imseih wrote:
I don't see v18 attached.


Ah, sorry, I attached v18.


But, I think it's wrong that you changed the design of this patch
(sample rate to duration based ) after it was ready for committer.



I've decided to reconsider how to better reduce the load on the spin-lock in pg_stat_statements.

If we look at the benchmark with 192 CPUs [0], we can see that even a small reduction in the frequency of incrementing counter of entries gives a significant performance boost. For example, at sample_rate = 0.75, performance almost doubles, while at sample_rate < 0.5, the spin-lock bottleneck is already eliminated. This suggests that even small adjustments to the frequency can significantly improve the situation.

But instead of blindly reducing the frequency via PRNG, we can take a more thoughtful approach with threshold by execute time:

 * Find the most frequent query by column 'calls' in pg_stat_statements;
 * In this query look at info about execution time: min_exec_time,
   max_exec_time, etc;
 * Gradually increase the threshold from min_exec_time to
   max_exec_time, limiting the tracking of this query.
 * Monitor performance: once the bottleneck is resolved, stop at the
   current threshold value.

This approach allows us to:

 * Eliminate the spin-lock bottleneck;
 * Preserve data about slow queries, which may be critical for
   performance analysis;
 * Reduce the load on the most frequent queries causing contention,
   instead of uniformly reducing the frequency for all queries.

BTW, I track queries in pgss_ProcessUtility without checking exec time because these queries unlikely can lead to bottleneck.

Any other thoughts or objections?



This CF [0] should go back to "Needs Review" if this is the case.


Done!

P.S. During the recent review, I noticed an interesting behavior: when we introduced \parse, changing the 'track' from 'none' to 'all' might lead to an undefined behavior. The steps are:

postgres=# SELECT pg_stat_statements_reset() IS NOT NULL AS t; t --- t (1 row) postgres=# SET pg_stat_statements.track = "none"; SET postgres=# SELECT 1 \parse stmt ?column? ---------- 1 (1 row) postgres=# \bind_named stmt \g ?column? ---------- 1 (1 row) postgres=# SELECT query, calls FROM pg_stat_statements; query | calls -------+------- (0 rows) postgres=# SET pg_stat_statements.track = "all"; SET postgres=# \bind_named stmt \g ?column? ---------- 1 (1 row) postgres=# SELECT query, calls FROM pg_stat_statements; query | calls ----------+------- SELECT 1 | 1 (1 row)

As you can see, the query has not been normalized. Is it a bug, expected or undefined behavior?

[0]: https://www.postgresql.org/message-id/CAA5RZ0vxn_UiUkUK5SdngObLZzw40RhafB7SHydxzBw2_xjjiA%40mail.gmail.com

-- Best regards, Ilia Evdokimov, Tantor Labs LLC.
From b8f88dcd4d44a11301776455cc133bb3e859d000 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.com>
Date: Thu, 20 Feb 2025 01:29:55 +0300
Subject: [PATCH v18] Allow setting execution time threshold for
 pg_stat_statements

New configuration parameter pg_stat_statements.track_min_duration makes it
possible to track statements whose execution time is greater than threshold,
to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile           |   2 +-
 .../expected/min_duration.out                 | 174 ++++++++++++++++++
 contrib/pg_stat_statements/meson.build        |   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  59 ++++--
 .../pg_stat_statements/sql/min_duration.sql   |  50 +++++
 doc/src/sgml/pgstatstatements.sgml            |  22 +++
 6 files changed, 287 insertions(+), 21 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/min_duration.out
 create mode 100644 contrib/pg_stat_statements/sql/min_duration.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..df512e428e 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup oldextversions
+	parallel min_duration cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/min_duration.out b/contrib/pg_stat_statements/expected/min_duration.out
new file mode 100644
index 0000000000..ef6f2a4437
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/min_duration.out
@@ -0,0 +1,174 @@
+--
+-- track_min_duration
+--
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+ pg_stat_statements.track 
+--------------------------
+ top
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+       query        | calls 
+--------------------+-------
+ SELECT $1 AS "int" |     1
+(1 row)
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query   | calls 
+-----------+-------
+ SELECT $1 |     1
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DEALLOCATE stmt;
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                       query                        
+----------+-------+----------------------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     2 | SELECT $1
+ t        |     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+ t        |     2 | SET pg_stat_statements.track_min_duration = $1
+(4 rows)
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                     query                      
+----------+-------+------------------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     3 | SELECT $1
+ t        |     1 | SET pg_stat_statements.track_min_duration = $1
+(3 rows)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 4446af58c5..6427665ee1 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -54,6 +54,7 @@ tests += {
       'privileges',
       'extended',
       'parallel',
+      'min_duration',
       'cleanup',
       'oldextversions',
     ],
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 4a3e855ce9..00ac11aed0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -291,16 +291,18 @@ static const struct config_enum_entry track_options[] =
 
 static int	pgss_max = 5000;	/* max # statements to track */
 static int	pgss_track = PGSS_TRACK_TOP;	/* tracking level */
+static int	pgss_track_min_duration = 0; /* msec or -1 */
 static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 											 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
 
 
-#define pgss_enabled(level) \
+#define pgss_enabled(level, skip_sample_check) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	(skip_sample_check == PGSS_INVALID || pgss_track_min_duration >= 0))
 
 #define record_gc_qtexts() \
 	do { \
@@ -415,6 +417,18 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomIntVariable("pg_stat_statements.track_min_duration",
+							"Sets the minimum execution time above which statements will be tracked.",
+							" -1 turns this feature off. 0 tracks all statements.",
+							&pgss_track_min_duration,
+							0,
+							-1, INT_MAX,
+							PGC_SUSET,
+							GUC_UNIT_MS,
+							NULL,
+							NULL,
+							NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 							 "Selects which statements are tracked by pg_stat_statements.",
 							 NULL,
@@ -837,7 +851,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
-	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
+	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level, PGSS_INVALID))
 		return;
 
 	/*
@@ -895,7 +909,7 @@ pgss_planner(Query *parse,
 	 * pgss_store needs it.  We also ignore query without queryid, as it would
 	 * be treated as a utility statement, which may not be the case.
 	 */
-	if (pgss_enabled(nesting_level)
+	if (pgss_enabled(nesting_level, PGSS_PLAN)
 		&& pgss_track_planning && query_string
 		&& parse->queryId != UINT64CONST(0))
 	{
@@ -1000,7 +1014,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+	if (pgss_enabled(nesting_level, PGSS_EXEC) &&
+		queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -1069,27 +1084,31 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
 	uint64		queryId = queryDesc->plannedstmt->queryId;
 
 	if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
-		pgss_enabled(nesting_level))
+		pgss_enabled(nesting_level, PGSS_EXEC))
 	{
+		double		msec;
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
 		 * levels of hook all do this.)
 		 */
 		InstrEndLoop(queryDesc->totaltime);
 
-		pgss_store(queryDesc->sourceText,
-				   queryId,
-				   queryDesc->plannedstmt->stmt_location,
-				   queryDesc->plannedstmt->stmt_len,
-				   PGSS_EXEC,
-				   queryDesc->totaltime->total * 1000.0,	/* convert to msec */
-				   queryDesc->estate->es_total_processed,
-				   &queryDesc->totaltime->bufusage,
-				   &queryDesc->totaltime->walusage,
-				   queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
-				   NULL,
-				   queryDesc->estate->es_parallel_workers_to_launch,
-				   queryDesc->estate->es_parallel_workers_launched);
+		/* convert to msec */
+		msec = queryDesc->totaltime->total * 1000.0;
+		if (msec >= pgss_track_min_duration)
+			pgss_store(queryDesc->sourceText,
+						queryId,
+						queryDesc->plannedstmt->stmt_location,
+						queryDesc->plannedstmt->stmt_len,
+						PGSS_EXEC,
+						msec,
+						queryDesc->estate->es_total_processed,
+						&queryDesc->totaltime->bufusage,
+						&queryDesc->totaltime->walusage,
+						queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
+						NULL,
+						queryDesc->estate->es_parallel_workers_to_launch,
+						queryDesc->estate->es_parallel_workers_launched);
 	}
 
 	if (prev_ExecutorEnd)
@@ -1112,7 +1131,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	uint64		saved_queryId = pstmt->queryId;
 	int			saved_stmt_location = pstmt->stmt_location;
 	int			saved_stmt_len = pstmt->stmt_len;
-	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level);
+	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level, PGSS_EXEC);
 
 	/*
 	 * Force utility statements to get queryId zero.  We do this even in cases
diff --git a/contrib/pg_stat_statements/sql/min_duration.sql b/contrib/pg_stat_statements/sql/min_duration.sql
new file mode 100644
index 0000000000..63903cc388
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/min_duration.sql
@@ -0,0 +1,50 @@
+--
+-- track_min_duration
+--
+
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 \parse stmt
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+DEALLOCATE stmt;
+
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+\bind_named stmt \g
+\bind_named stmt \g
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+\bind_named stmt \g
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
\ No newline at end of file
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index e2ac1c2d50..1c5f57425f 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -945,6 +945,28 @@
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term>
+     <varname>pg_stat_statements.track_min_duration</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>pg_stat_statements.track_min_duration</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      <varname>pg_stat_statements.track_min_duration</varname> is the minimum statement
+      execution time, in milliseconds, that will cause the statement to
+      be tracked. Setting this to <literal>0</literal> tracks all statements.
+      <literal>-1</literal> (the default) disables tracking of statements.
+      The same as setting <varname>pg_stat_statements.track</varname> to <literal>none</literal>.
+      For example, if you set it to <literal>250ms</literal> then all statements
+      that run 250ms or longer will be tracked. Only superusers can change this
+      setting.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term>
      <varname>pg_stat_statements.save</varname> (<type>boolean</type>)
-- 
2.34.1

Reply via email to