Hi hackers,

I've decided to explore a slightly different approach to reducing spinlock contention—by introducing a simple execution time threshold. If a query’s execution time exceeds this threshold, it is recorded in pg_stat_statements; otherwise, it is ignored. As Alexander [0] pointed out, this helps retain valuable queries for further analysis. A similar mechanism is already present in auto_explain and pg_store_plans. When pg_stat_statements.track_min_duration = -1, disable tracking. If pg_stat_statements.track_min_duration = -1, all statements are tracked.

I benchmarked this approach using -M prepared -S on my machine with 48 CPUs. However, I couldn’t reproduce spinlock contention because the machine isn’t large enough to create sufficient concurrency. Nevertheless, I’m sharing my results for reference and checking correct results of threshold.

Here’s the benchmarking procedure I followed:
    createdb pgbench
    pgbench -i -s 3000 pgbench
    psql -c 'SELECT pg_stat_statements_reset()'
    pgbench -c 46 -j 46 -T 120 -M prepared -S --progress=10 pgbench

select query, calls, min_exec_time, max_exec_time, mean_exec_time, stddev_exec_time from pg_stat_statements where query = 'SELECT abalance FROM pgbench_accounts WHERE aid = $1';

track_min_duration |     calls | min_exec_time | max_exec_time |       mean_exec_time | stddev_exec_time                  0 | 111282955 |       0.00365 |      15.56946 | 0.015042374707317802 | 0.06067634978916631                  5 |       458 |       5.00627 |      15.699129 |    5.962879746724887 | 1.1432124887616204                 10 |        14 |      10.538461 |      16.113204 |   12.415218999999999 | 1.5598854455354354                 20 |         - |              - |              - |                    - | -                 -1 |         - |              - |              - |                    - | -

I’d greatly appreciate any feedback on this alternative approach, as well as benchmarking on a pretty large machine to see its impact at scale.

[0]: https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From aaf82d481aeb40458dfd7debfb3bcd254f33cdee Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.ru>
Date: Fri, 14 Feb 2025 15:40:36 +0300
Subject: [PATCH v16] Allow setting execution time threshold for
 pg_stat_statements

New configuration parameter pg_stat_statements.track_min_duration makes it
possible to track statements whose execution time is greater than threshold,
to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile           |   2 +-
 .../pg_stat_statements/expected/duration.out  | 201 ++++++++++++++++++
 contrib/pg_stat_statements/meson.build        |   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  79 ++++---
 contrib/pg_stat_statements/sql/duration.sql   |  58 +++++
 doc/src/sgml/pgstatstatements.sgml            |  21 ++
 6 files changed, 332 insertions(+), 30 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/duration.out
 create mode 100644 contrib/pg_stat_statements/sql/duration.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..4b2a59d7c3 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup oldextversions
+	parallel duration cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/duration.out b/contrib/pg_stat_statements/expected/duration.out
new file mode 100644
index 0000000000..8bbd4b0183
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/duration.out
@@ -0,0 +1,201 @@
+--
+-- track_min_duration
+--
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+ pg_stat_statements.track 
+--------------------------
+ top
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+       query        | calls 
+--------------------+-------
+ SELECT $1 AS "int" |     1
+(1 row)
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query   | calls 
+-----------+-------
+ SELECT $1 |     1
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DEALLOCATE stmt;
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                       query                        
+----------+-------+----------------------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     2 | SELECT $1
+ t        |     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+ t        |     2 | SET pg_stat_statements.track_min_duration = $1
+(4 rows)
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                     query                      
+----------+-------+------------------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     3 | SELECT $1
+ t        |     1 | SET pg_stat_statements.track_min_duration = $1
+(3 rows)
+
+-- threshold check
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 100;
+SELECT pg_sleep(0.001);	-- 1   ms < pg_stat_statements.track_min_duration
+ pg_sleep 
+----------
+ 
+(1 row)
+
+SELECT pg_sleep(0.5);	-- 500 ms > pg_stat_statements.track_min_duration
+ pg_sleep 
+----------
+ 
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |        query        
+----------+-------+---------------------
+ t        |     1 | SELECT pg_sleep($1)
+(1 row)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 4446af58c5..2bc918bcd3 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -54,6 +54,7 @@ tests += {
       'privileges',
       'extended',
       'parallel',
+      'duration',
       'cleanup',
       'oldextversions',
     ],
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..0eb509d7ca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -290,6 +290,7 @@ static const struct config_enum_entry track_options[] =
 
 static int	pgss_max = 5000;	/* max # statements to track */
 static int	pgss_track = PGSS_TRACK_TOP;	/* tracking level */
+static int	pgss_track_min_duration = 0; /* msec or -1 */
 static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 											 * duration */
@@ -414,6 +415,18 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomIntVariable("pg_stat_statements.track_min_duration",
+							"Sets the minimum execution time above which plans will be tracked.",
+							"Zero tracks all plans. -1 turns this feature off.",
+							&pgss_track_min_duration,
+							0,
+							-1, INT_MAX,
+							PGC_SUSET,
+							GUC_UNIT_MS,
+							NULL,
+							NULL,
+							NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 							 "Selects which statements are tracked by pg_stat_statements.",
 							 NULL,
@@ -999,7 +1012,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+	if (pgss_enabled(nesting_level) && pgss_track_min_duration >= 0 &&
+		queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -1068,27 +1082,30 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
 	uint64		queryId = queryDesc->plannedstmt->queryId;
 
 	if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
-		pgss_enabled(nesting_level))
+		pgss_enabled(nesting_level) && pgss_track_min_duration >= 0)
 	{
+		double		msec;
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
 		 * levels of hook all do this.)
 		 */
 		InstrEndLoop(queryDesc->totaltime);
 
-		pgss_store(queryDesc->sourceText,
-				   queryId,
-				   queryDesc->plannedstmt->stmt_location,
-				   queryDesc->plannedstmt->stmt_len,
-				   PGSS_EXEC,
-				   queryDesc->totaltime->total * 1000.0,	/* convert to msec */
-				   queryDesc->estate->es_total_processed,
-				   &queryDesc->totaltime->bufusage,
-				   &queryDesc->totaltime->walusage,
-				   queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
-				   NULL,
-				   queryDesc->estate->es_parallel_workers_to_launch,
-				   queryDesc->estate->es_parallel_workers_launched);
+		msec = queryDesc->totaltime->total * 1000.0;
+		if (msec >= pgss_track_min_duration)
+			pgss_store(queryDesc->sourceText,
+						queryId,
+						queryDesc->plannedstmt->stmt_location,
+						queryDesc->plannedstmt->stmt_len,
+						PGSS_EXEC,
+						msec,
+						queryDesc->estate->es_total_processed,
+						&queryDesc->totaltime->bufusage,
+						&queryDesc->totaltime->walusage,
+						queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
+						NULL,
+						queryDesc->estate->es_parallel_workers_to_launch,
+						queryDesc->estate->es_parallel_workers_launched);
 	}
 
 	if (prev_ExecutorEnd)
@@ -1111,7 +1128,8 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	uint64		saved_queryId = pstmt->queryId;
 	int			saved_stmt_location = pstmt->stmt_location;
 	int			saved_stmt_len = pstmt->stmt_len;
-	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level);
+	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level) &&
+						  pgss_track_min_duration >= 0;
 
 	/*
 	 * Force utility statements to get queryId zero.  We do this even in cases
@@ -1154,6 +1172,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 					bufusage;
 		WalUsage	walusage_start,
 					walusage;
+		double		msec;
 
 		bufusage_start = pgBufferUsage;
 		walusage_start = pgWalUsage;
@@ -1209,19 +1228,21 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		memset(&walusage, 0, sizeof(WalUsage));
 		WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
 
-		pgss_store(queryString,
-				   saved_queryId,
-				   saved_stmt_location,
-				   saved_stmt_len,
-				   PGSS_EXEC,
-				   INSTR_TIME_GET_MILLISEC(duration),
-				   rows,
-				   &bufusage,
-				   &walusage,
-				   NULL,
-				   NULL,
-				   0,
-				   0);
+		msec = INSTR_TIME_GET_MILLISEC(duration);
+		if (msec >= pgss_track_min_duration)
+			pgss_store(queryString,
+						saved_queryId,
+						saved_stmt_location,
+						saved_stmt_len,
+						PGSS_EXEC,
+						msec,
+						rows,
+						&bufusage,
+						&walusage,
+						NULL,
+						NULL,
+						0,
+						0);
 	}
 	else
 	{
diff --git a/contrib/pg_stat_statements/sql/duration.sql b/contrib/pg_stat_statements/sql/duration.sql
new file mode 100644
index 0000000000..b99c8288bb
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/duration.sql
@@ -0,0 +1,58 @@
+--
+-- track_min_duration
+--
+
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 \parse stmt
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+DEALLOCATE stmt;
+
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+\bind_named stmt \g
+\bind_named stmt \g
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+\bind_named stmt \g
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+
+-- threshold check
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 100;
+SELECT pg_sleep(0.001);	-- 1   ms < pg_stat_statements.track_min_duration
+SELECT pg_sleep(0.5);	-- 500 ms > pg_stat_statements.track_min_duration
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..21047f6a7d 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,27 @@
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term>
+     <varname>pg_stat_statements.track_min_duration</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>pg_stat_statements.track_min_duration</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      <varname>pg_stat_statements.track_min_duration</varname> is the minimum statement
+      execution time, in milliseconds, that will cause the statement to
+      be tracked. Setting this to <literal>0</literal> tracks all statements.
+      <literal>-1</literal> (the default) disables tracking of statements. For
+      example, if you set it to <literal>250ms</literal> then all statements
+      that run 250ms or longer will be tracked. Only superusers can change this
+      setting.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term>
      <varname>pg_stat_statements.save</varname> (<type>boolean</type>)
-- 
2.34.1

Reply via email to