Hi hackers,
I've decided to explore a slightly different approach to reducing
spinlock contention—by introducing a simple execution time threshold. If
a query’s execution time exceeds this threshold, it is recorded in
pg_stat_statements; otherwise, it is ignored. As Alexander [0] pointed
out, this helps retain valuable queries for further analysis. A similar
mechanism is already present in auto_explain and pg_store_plans. When
pg_stat_statements.track_min_duration = -1, disable tracking. If
pg_stat_statements.track_min_duration = -1, all statements are tracked.
I benchmarked this approach using -M prepared -S on my machine with 48
CPUs. However, I couldn’t reproduce spinlock contention because the
machine isn’t large enough to create sufficient concurrency.
Nevertheless, I’m sharing my results for reference and checking correct
results of threshold.
Here’s the benchmarking procedure I followed:
createdb pgbench
pgbench -i -s 3000 pgbench
psql -c 'SELECT pg_stat_statements_reset()'
pgbench -c 46 -j 46 -T 120 -M prepared -S --progress=10 pgbench
select query, calls, min_exec_time, max_exec_time, mean_exec_time,
stddev_exec_time from pg_stat_statements where query = 'SELECT abalance
FROM pgbench_accounts WHERE aid = $1';
track_min_duration | calls | min_exec_time | max_exec_time |
mean_exec_time | stddev_exec_time
0 | 111282955 | 0.00365 | 15.56946 |
0.015042374707317802 | 0.06067634978916631
5 | 458 | 5.00627 | 15.699129 |
5.962879746724887 | 1.1432124887616204
10 | 14 | 10.538461 | 16.113204 |
12.415218999999999 | 1.5598854455354354
20 | - | - | - |
- | -
-1 | - | - | - |
- | -
I’d greatly appreciate any feedback on this alternative approach, as
well as benchmarking on a pretty large machine to see its impact at scale.
[0]:
https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail.com
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From aaf82d481aeb40458dfd7debfb3bcd254f33cdee Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.ru>
Date: Fri, 14 Feb 2025 15:40:36 +0300
Subject: [PATCH v16] Allow setting execution time threshold for
pg_stat_statements
New configuration parameter pg_stat_statements.track_min_duration makes it
possible to track statements whose execution time is greater than threshold,
to reduce the amount of tracking.
---
contrib/pg_stat_statements/Makefile | 2 +-
.../pg_stat_statements/expected/duration.out | 201 ++++++++++++++++++
contrib/pg_stat_statements/meson.build | 1 +
.../pg_stat_statements/pg_stat_statements.c | 79 ++++---
contrib/pg_stat_statements/sql/duration.sql | 58 +++++
doc/src/sgml/pgstatstatements.sgml | 21 ++
6 files changed, 332 insertions(+), 30 deletions(-)
create mode 100644 contrib/pg_stat_statements/expected/duration.out
create mode 100644 contrib/pg_stat_statements/sql/duration.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..4b2a59d7c3 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
REGRESS = select dml cursors utility level_tracking planning \
user_activity wal entry_timestamp privileges extended \
- parallel cleanup oldextversions
+ parallel duration cleanup oldextversions
# Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
# which typical installcheck users do not have (e.g. buildfarm clients).
NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/duration.out b/contrib/pg_stat_statements/expected/duration.out
new file mode 100644
index 0000000000..8bbd4b0183
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/duration.out
@@ -0,0 +1,201 @@
+--
+-- track_min_duration
+--
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+ pg_stat_statements.track
+--------------------------
+ top
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT 1 AS "int";
+ int
+-----
+ 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+ int
+-----
+ 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+--------------------+-------
+ SELECT $1 AS "int" | 1
+(1 row)
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column?
+----------
+ 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+ ?column?
+----------
+ 1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls
+-----------+-------
+ SELECT $1 | 1
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+DEALLOCATE stmt;
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
+ toplevel | calls | query
+----------+-------+----------------------------------------------------
+ t | 2 | EXPLAIN (COSTS OFF) SELECT $1
+ f | 2 | SELECT $1
+ t | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+ t | 2 | SET pg_stat_statements.track_min_duration = $1
+(4 rows)
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
+ toplevel | calls | query
+----------+-------+------------------------------------------------
+ t | 2 | EXPLAIN (COSTS OFF) SELECT $1
+ f | 3 | SELECT $1
+ t | 1 | SET pg_stat_statements.track_min_duration = $1
+(3 rows)
+
+-- threshold check
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t
+---
+ t
+(1 row)
+
+SET pg_stat_statements.track_min_duration = 100;
+SELECT pg_sleep(0.001); -- 1 ms < pg_stat_statements.track_min_duration
+ pg_sleep
+----------
+
+(1 row)
+
+SELECT pg_sleep(0.5); -- 500 ms > pg_stat_statements.track_min_duration
+ pg_sleep
+----------
+
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
+ toplevel | calls | query
+----------+-------+---------------------
+ t | 1 | SELECT pg_sleep($1)
+(1 row)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 4446af58c5..2bc918bcd3 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -54,6 +54,7 @@ tests += {
'privileges',
'extended',
'parallel',
+ 'duration',
'cleanup',
'oldextversions',
],
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..0eb509d7ca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -290,6 +290,7 @@ static const struct config_enum_entry track_options[] =
static int pgss_max = 5000; /* max # statements to track */
static int pgss_track = PGSS_TRACK_TOP; /* tracking level */
+static int pgss_track_min_duration = 0; /* msec or -1 */
static bool pgss_track_utility = true; /* whether to track utility commands */
static bool pgss_track_planning = false; /* whether to track planning
* duration */
@@ -414,6 +415,18 @@ _PG_init(void)
NULL,
NULL);
+ DefineCustomIntVariable("pg_stat_statements.track_min_duration",
+ "Sets the minimum execution time above which plans will be tracked.",
+ "Zero tracks all plans. -1 turns this feature off.",
+ &pgss_track_min_duration,
+ 0,
+ -1, INT_MAX,
+ PGC_SUSET,
+ GUC_UNIT_MS,
+ NULL,
+ NULL,
+ NULL);
+
DefineCustomEnumVariable("pg_stat_statements.track",
"Selects which statements are tracked by pg_stat_statements.",
NULL,
@@ -999,7 +1012,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
* counting of optimizable statements that are directly contained in
* utility statements.
*/
- if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+ if (pgss_enabled(nesting_level) && pgss_track_min_duration >= 0 &&
+ queryDesc->plannedstmt->queryId != UINT64CONST(0))
{
/*
* Set up to track total elapsed time in ExecutorRun. Make sure the
@@ -1068,27 +1082,30 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
uint64 queryId = queryDesc->plannedstmt->queryId;
if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
- pgss_enabled(nesting_level))
+ pgss_enabled(nesting_level) && pgss_track_min_duration >= 0)
{
+ double msec;
/*
* Make sure stats accumulation is done. (Note: it's okay if several
* levels of hook all do this.)
*/
InstrEndLoop(queryDesc->totaltime);
- pgss_store(queryDesc->sourceText,
- queryId,
- queryDesc->plannedstmt->stmt_location,
- queryDesc->plannedstmt->stmt_len,
- PGSS_EXEC,
- queryDesc->totaltime->total * 1000.0, /* convert to msec */
- queryDesc->estate->es_total_processed,
- &queryDesc->totaltime->bufusage,
- &queryDesc->totaltime->walusage,
- queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
- NULL,
- queryDesc->estate->es_parallel_workers_to_launch,
- queryDesc->estate->es_parallel_workers_launched);
+ msec = queryDesc->totaltime->total * 1000.0;
+ if (msec >= pgss_track_min_duration)
+ pgss_store(queryDesc->sourceText,
+ queryId,
+ queryDesc->plannedstmt->stmt_location,
+ queryDesc->plannedstmt->stmt_len,
+ PGSS_EXEC,
+ msec,
+ queryDesc->estate->es_total_processed,
+ &queryDesc->totaltime->bufusage,
+ &queryDesc->totaltime->walusage,
+ queryDesc->estate->es_jit ? &queryDesc->estate->es_jit->instr : NULL,
+ NULL,
+ queryDesc->estate->es_parallel_workers_to_launch,
+ queryDesc->estate->es_parallel_workers_launched);
}
if (prev_ExecutorEnd)
@@ -1111,7 +1128,8 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
uint64 saved_queryId = pstmt->queryId;
int saved_stmt_location = pstmt->stmt_location;
int saved_stmt_len = pstmt->stmt_len;
- bool enabled = pgss_track_utility && pgss_enabled(nesting_level);
+ bool enabled = pgss_track_utility && pgss_enabled(nesting_level) &&
+ pgss_track_min_duration >= 0;
/*
* Force utility statements to get queryId zero. We do this even in cases
@@ -1154,6 +1172,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
bufusage;
WalUsage walusage_start,
walusage;
+ double msec;
bufusage_start = pgBufferUsage;
walusage_start = pgWalUsage;
@@ -1209,19 +1228,21 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
memset(&walusage, 0, sizeof(WalUsage));
WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
- pgss_store(queryString,
- saved_queryId,
- saved_stmt_location,
- saved_stmt_len,
- PGSS_EXEC,
- INSTR_TIME_GET_MILLISEC(duration),
- rows,
- &bufusage,
- &walusage,
- NULL,
- NULL,
- 0,
- 0);
+ msec = INSTR_TIME_GET_MILLISEC(duration);
+ if (msec >= pgss_track_min_duration)
+ pgss_store(queryString,
+ saved_queryId,
+ saved_stmt_location,
+ saved_stmt_len,
+ PGSS_EXEC,
+ msec,
+ rows,
+ &bufusage,
+ &walusage,
+ NULL,
+ NULL,
+ 0,
+ 0);
}
else
{
diff --git a/contrib/pg_stat_statements/sql/duration.sql b/contrib/pg_stat_statements/sql/duration.sql
new file mode 100644
index 0000000000..b99c8288bb
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/duration.sql
@@ -0,0 +1,58 @@
+--
+-- track_min_duration
+--
+
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.track_min_duration = -1;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 \parse stmt
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.track_min_duration = 0;
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+DEALLOCATE stmt;
+
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SET pg_stat_statements.track_min_duration = -1;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 0;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+\bind_named stmt \g
+\bind_named stmt \g
+SET pg_stat_statements.track_min_duration = -1;
+\bind_named stmt \g
+\bind_named stmt \g
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
+
+-- threshold check
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.track_min_duration = 100;
+SELECT pg_sleep(0.001); -- 1 ms < pg_stat_statements.track_min_duration
+SELECT pg_sleep(0.5); -- 500 ms > pg_stat_statements.track_min_duration
+SELECT toplevel, calls, query FROM pg_stat_statements
+ ORDER BY query COLLATE "C";
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..21047f6a7d 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,27 @@
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>
+ <varname>pg_stat_statements.track_min_duration</varname> (<type>real</type>)
+ <indexterm>
+ <primary><varname>pg_stat_statements.track_min_duration</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <varname>pg_stat_statements.track_min_duration</varname> is the minimum statement
+ execution time, in milliseconds, that will cause the statement to
+ be tracked. Setting this to <literal>0</literal> tracks all statements.
+ <literal>-1</literal> (the default) disables tracking of statements. For
+ example, if you set it to <literal>250ms</literal> then all statements
+ that run 250ms or longer will be tracked. Only superusers can change this
+ setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term>
<varname>pg_stat_statements.save</varname> (<type>boolean</type>)
--
2.34.1