On 19.11.2024 15:11, Andrey M. Borodin wrote:


On 18 Nov 2024, at 23:33, Ilia Evdokimov<ilya.evdoki...@tantorlabs.com>  wrote:

Hi hackers,

Under high-load scenarios with a significant number of transactions per second, 
pg_stat_statements introduces substantial overhead due to the collection and 
storage of statistics. Currently, we are sometimes forced to disable 
pg_stat_statements or adjust the size of the statistics using 
pg_stat_statements.max, which is not always optimal. One potential solution to 
this issue could be query sampling in pg_stat_statements.

A similar approach has been implemented in extensions like auto_explain and 
pg_store_plans, and it has proven very useful in high-load systems. However, 
this approach has its trade-offs, as it sacrifices statistical accuracy for 
improved performance. This patch introduces a new configuration parameter, 
pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch 
provides the ability to control the sampling of query statistics in 
pg_stat_statements.

This patch serves as a proof of concept (POC), and I would like to hear your 
thoughts on whether such an approach is viable and applicable.
+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of 
them were using it wrong though. But at least it could give an easy way to rule 
out performance impact of pgss.
Thank you for review.
On 19 Nov 2024, at 15:09, Ilia Evdokimov<ilya.evdoki...@tantorlabs.com>  wrote:

I believe we should also include this check in the pgss_ExecutorEnd() function 
because sampling in pgss_ExecutorEnd() ensures that a query not initially 
sampled in pgss_ExecutorStart() can still be logged if it meets the 
pg_stat_statements.sample_rate criteria. This approach adds flexibility by 
allowing critical queries to be captured while maintaining efficient sampling.
Is there a reason why pgss_ProcessUtility is excluded?


Best regards, Andrey Borodin.


Ah, you’re right! Moreover, this check is needed not only in pgss_ProcessUtility() but in all places where pgss_enabled() is called. Therefore, it’s better to move the sampling check directly into pgss_enabled().

However, another issue arises with the initialization of 'current_query_sample' variable that contains whether query is sampling or not. Initializing it in pgss_ExecutorStart()||is not sufficient, because pgss_post_parse_analyze() or pgss_ProcessUtility() might be called earlier. This could lead to different values of 'current_query_sample' being used in these functions, which is undesirable. Run the regression tests for pg_stat_statements with initializing in pgss_ExecutorStart(), and you'll see this.

To avoid this, we need to find a function that is called earlier than all the others. In my opinion, pgss_post_parse_analyze() is a good candidate for this purpose. If you have objections or better suggestions, feel free to share them. I attached the patch with fixes.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 725d0a428040d46cfedac0ca707d7d7a27db8721 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.com>
Date: Tue, 19 Nov 2024 16:58:28 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to control just a fraction of the queries meeting the configured
threshold, to reduce the amount of controlling.
---
 .../pg_stat_statements/pg_stat_statements.c   | 22 ++++++++++++++++++-
 doc/src/sgml/pgstatstatements.sgml            | 17 ++++++++++++++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..01586c32b0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -294,12 +295,16 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 											 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +419,19 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+							 "Fraction of queries to process.",
+							 NULL,
+							 &pgss_sample_rate,
+							 1.0,
+							 0.0,
+							 1.0,
+							 PGC_SUSET,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 							 "Selects which statements are tracked by pg_stat_statements.",
 							 NULL,
@@ -835,6 +853,8 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
+	current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);
+
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
 		return;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..1e2533a802 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,23 @@
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term>
+     <varname>pg_stat_statements.sample_rate</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>pg_stat_statements.sample_rate</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      <varname>pg_stat_statements.sample_rate</varname> causes pg_stat_statements to only
+      control a fraction of the statements in each session.  The default is 1,
+      meaning control all the queries. Only superusers can change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term>
      <varname>pg_stat_statements.save</varname> (<type>boolean</type>)
-- 
2.34.1

Reply via email to