Hi hackers,Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantial overhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statements or adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potential solution to this issue could be query sampling in pg_stat_statements.
A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very useful in high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improved performance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements.
This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viable and applicable.
-- Best regards, Ilia Evdokimov, Tantor Labs LLC.
From b19081bd257014f0c4046570097e4bc7b28a3126 Mon Sep 17 00:00:00 2001 From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.com> Date: Mon, 18 Nov 2024 21:05:43 +0300 Subject: [PATCH] [PATCH] Allow setting sample ratio for pg_stat_statements New configuration parameter pg_stat_statements.sample_ratio makes it possible to control just a fraction of the queries meeting the configured threshold, to reduce the amount of controlling. --- .../pg_stat_statements/pg_stat_statements.c | 22 ++++++++++++++++++- doc/src/sgml/pgstatstatements.sgml | 17 ++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 49c657b3e0..51d1a48252 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -49,6 +49,7 @@ #include "access/parallel.h" #include "catalog/pg_authid.h" +#include "common/pg_prng.h" #include "common/int.h" #include "executor/instrument.h" #include "funcapi.h" @@ -73,6 +74,8 @@ PG_MODULE_MAGIC; +static double pgss_sample_rate = 1; + /* Location of permanent stats file (valid when database is shut down) */ #define PGSS_DUMP_FILE PGSTAT_STAT_PERMANENT_DIRECTORY "/pg_stat_statements.stat" @@ -295,6 +298,8 @@ static bool pgss_track_planning = false; /* whether to track planning * duration */ static bool pgss_save = true; /* whether to save stats across shutdown */ +/* Is the current top-level query to be sampled? */ +static bool current_query_sampled = false; #define pgss_enabled(level) \ (!IsParallelWorker() && \ @@ -414,6 +419,19 @@ _PG_init(void) NULL, NULL); + DefineCustomRealVariable("pg_stat_statements.sample_rate", + "Fraction of queries to process.", + NULL, + &pgss_sample_rate, + 1.0, + 0.0, + 1.0, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + DefineCustomEnumVariable("pg_stat_statements.track", "Selects which statements are tracked by pg_stat_statements.", NULL, @@ -989,6 +1007,8 @@ pgss_planner(Query *parse, static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) { + current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate); + if (prev_ExecutorStart) prev_ExecutorStart(queryDesc, eflags); else @@ -999,7 +1019,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) * counting of optimizable statements that are directly contained in * utility statements. */ - if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0)) + if (current_query_sampled && pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0)) { /* * Set up to track total elapsed time in ExecutorRun. Make sure the diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index 501b468e9a..1e2533a802 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -936,6 +936,23 @@ </listitem> </varlistentry> + <varlistentry> + <term> + <varname>pg_stat_statements.sample_rate</varname> (<type>real</type>) + <indexterm> + <primary><varname>pg_stat_statements.sample_rate</varname> configuration parameter</primary> + </indexterm> + </term> + + <listitem> + <para> + <varname>pg_stat_statements.sample_rate</varname> causes pg_stat_statements to only + control a fraction of the statements in each session. The default is 1, + meaning control all the queries. Only superusers can change this setting. + </para> + </listitem> + </varlistentry> + <varlistentry> <term> <varname>pg_stat_statements.save</varname> (<type>boolean</type>) -- 2.34.1