from:"Ilia Evdokimov"

Re: pg_stat_advisor extension

2024-02-05 Thread Ilia Evdokimov

Hi hackers, I'm reaching out again regarding the patch with new extension 'pg_stat_advisor' aimed at enhancing query plan efficiency through the suggestion of creating statistics. I understand the community is busy, but I would greatly value any feedback or thoughts on this extension. Thank you for your time and consideration. Best regards,Ilia Evdokimov,Tantor Labs LLC.

Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov


Hi hackers,


I've encountered and addressed errors in the 
"0001-pg_stat_advisor-extension.patch" when applying it to the main 
branch, specifically trailing whitespace issues at lines 117 and 118:


```
0001-pg_stat_advisor-extension.patch:117: trailing whitespace.
   QUERY PLAN
0001-pg_stat_advisor-extension.patch:118: trailing whitespace.

warning: 2 lines add whitespace errors.

```

An updated patch is attached for review


I welcome your insights, feedback, and evaluations regarding the 
necessity of integrating this new extension into PostgreSQL.



Kind regards,

Ilia Evdokimov,

Tantor Labs LLC.
From 6316706c42996219e507bb6ded9dd1e872180e38 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 6 Feb 2024 18:11:04 +0300
Subject: [PATCH] pg_stat_advisor extension

---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 477 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 763 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/README.md b/contrib/pg_stat_advisor/README.md
new file mode 100644
index 00..f9610f2ed5
--- /dev/null
+++ b/contrib/pg_stat_advisor/README.md
@@ -0,0 +1,85 @@
+## pg_stat_advisor - PostgreSQL advisor to create statistics
+
+pg_stat_advisor is a PostgreSQL extension designed to analyze query performance and recommend the creation of additional statistics to improve query plan.
+
+Append pg_stat_advisor to shared_preload_libraries configuration parameter in your postgresql.conf file then restart the PostgreSQL database to apply the changes. Or you can use "LOAD 'pg_stat_advisor';"command
+```
+LOAD 'pg_stat_advisor';
+```
+
+There is the pg_stat_advisor.suggest_statistics_threshold GUC that can be used to set a suggest_statistics_threshold. It is the the ratio of total tuples produced compared to the planned rows. If parameter is set by 0, the printing switches off.
+
+For example:
+```
+SET pg_stat_advisor.suggest_statistics_threshold = 1.0;
+```
+
+Examples:
+
+
+```
+postgres=# create table t (i int, j int);
+CREATE TABLE
+postgres=# insert into t select i/10, i/100 from generate_series(1, 100) i;
+INSERT 0 100
+postgres=# analyze t;
+ANALYZE
+postgres=# explain analyze select * from t where i = 100 and j = 10;
+   QUERY PLAN
+
+--
+

Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov


Dear Team,

Firstly, I would like to extend my sincere apologies for the confusion 
and technical oversights in our previous discussions regarding the 
'pg_stat_advisor extension'. To address this and facilitate a clearer, 
more focused dialogue, I have initiated a new thread to consolidate our 
discussions on this matter.


For context, our previous conversation can be found here: 
https://www.postgresql.org/message-id/flat/4681151706615977%40mail.yandex.ru.


The extension 'pg_stat_advisor' extension is architected to optimize 
query plan. It operates by suggesting when to create extended 
statistics, particularly in queries where current selectivity estimates 
fall short. This is achieved through the GUC parameter 
'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio 
of total tuples compared to the planned rows. This feature is 
instrumental in identifying scenarios where the planner's estimates 
could be optimized.


You can install the extension by:

```

LOAD 'pg_stat_advisor'

SET pg_stat_advisor.suggest_statistics_threshold = 1.0;

```


Example:

```

EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10;

NOTICE: pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t

    QUERY PLAN



```

After EXPLAIN ANALYZE command you can see the message of suggestion 
creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table.


Thank you for your understanding, patience, and continued support.



Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov


On Feb 8 2024 at 00:00:00 jian he

>INT MAX

>should be 1.0?

I don’t know why Konstantin Knizhnik used the ratio of actual tuples to 
the planned ones, but most who start testing my extension expect that it 
will be a coefficient from 0 to 1, which will be the ratio of the 
estimated tuples to the actual ones. Therefore, I changed the value of 
this coefficient the other way around and now the value can be from 0 to 
1. The patch with changes has been attached.



> now CREATE STATISTICS, the statistics name is optional

I constructed the name of the statistics so that the user could copy the 
line with 'CREATE STATISTICS' with the mouse and execute this command 
faster. But if the user wants ITS name, he can do it manually.



> here you can explicitly mention the statistics kind would be great

I agree with you. That would be my next step. That's why I'm doing it now.


> Also since the documentation is limited, more comments 
explainingSuggestMultiColumnStatisticsForNode would be great.



overall the comments are very little, it should be more (that's my opinion).


Yes, certainly. I'll do it in the next patch.

I'm looking forward to your thoughts and feedback.

Regards,

Ilia Evdokimov,

Tantor Labs LLC.
From f87f4a57e532d57f43dab4764d08ddf83d9f3d8f Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 8 Feb 2024 16:00:57 +0300
Subject: [PATCH] 'pg_stat_advisor' extension.

This serves as a hook into the executor for determining total rows and planned rows.
The process starts by checking the `pg_stat_advisor.suggest_statistics_threshold` GUC parameter.
If it's set to 0.0, the extension does not proceed further. When the parameter is greater than 0.0,
the extension evaluates the accuracy of planned rows. A suggestion for creating statistics is made
if the ratio of total rows to planned rows is greater than or equal to this threshold.

Only then does it extract the relation and columns from the query.
The extension checks pg_statistic_ext for existing relevant statistics. If no statistics are found,
it prints a notice suggesting the creation of statistics, using the naming format 'relationName_columns'.

Author: Ilia Evdokimov
---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 +++
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 482 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 768 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/README.md b/contrib/pg_stat_advisor/README

Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov

Our further discussion of this new extension takes place in this thread: https://www.postgresql.org/message-id/flat/f822b674-9697-43b9-931b-4d69729a26ff%40tantorlabs.com .Due to technical difficulties in the current thread, I will not be able to conduct a dialogue except in HTML format. And this will make it inconvenient for everyone to read the messages. I apologize for the inconvenience caused. Regards, Ilia Evdokimov, TantorLabs LLC.

Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov


On Feb 08 2024 at 07:14:18, Andrei Lepikhov wrote:

1. In the case of parallel workers the plan_rows value has a different 
semantics than the number of rows predicted. Just explore 
get_parallel_divisor().
Yes, this is a very weighty and important issue. I need to think about 
this very carefully.
2. The extension recommends new statistics immediately upon an error 
finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.
According to No.2, it might make sense to collect and track clause 
combinations and cardinality errors found and let the DBA make decisions 
on their own.
Your proposal is very interesting. In my opinion, it is worth 
considering updating the extended statistics if they are truly stale. 
And write about this in a separate message with suggestion updating 
statistics.
If I succeed, then in the next patch I will add the kind of extended 
statistics to the message, deal with the parallel workers and update 
statistics if necessary.
If you have additional suggestions and thoughts, feel free to write them 
in this thread.



Regards,
Ilia Evdokimov,
Tantor Labs LLC.

list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ilia Evdokimov


Hi Hackers

I have identified a potential memory leak in the 
`addRangeTableEntryForJoin()` function. The second parameter of 
`addRangeTableEntryForJoin()`, `colnames`, is a `List*` that is 
concatenated with another `List*`, `eref->colnames`, using 
`list_concat()`. We need to pass only the last `numaliases` elements of 
the list, for which we use `list_copy_tail`. This function creates a 
copy of the `colnames` list, resulting in `colnames` pointing to the 
current list that will not be freed. Consequently, a new list is already 
concatenated.


To address this issue, I have invoked `list_free(colnames)` afterwards. 
If anyone is aware of where the list is being freed or has any 
suggestions for improvement, I would greatly appreciate your input.


Best Regards,

Ilia Evdokimov,

TantorLabs LCC
From cd7aa7ac5904501085a980944dc3c18f42721c06 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 10 Jun 2024 13:10:31 +0300
Subject: [PATCH] After concatenation two lists where the second one is from
 list_copy_tail do not free it

---
 src/backend/parser/parse_relation.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 2f64eaf0e3..8230cf27cc 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -2258,8 +2258,11 @@ addRangeTableEntryForJoin(ParseState *pstate,
 
 	/* fill in any unspecified alias columns */
 	if (numaliases < list_length(colnames))
+	{
 		eref->colnames = list_concat(eref->colnames,
 	 list_copy_tail(colnames, numaliases));
+		list_free(colnames);
+	}
 
 	if (numaliases > list_length(colnames))
 		ereport(ERROR,
-- 
2.34.1

Re: list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ilia Evdokimov

>But callers of addRangeTableEntryForJoin(), expects to handle a list 
or NIL, if we free the memory
I've thoroughly reviewed all callers of the 
`addRangeTableEntryForJoin()` function and confirmed that the list is 
not used after this function is called. Since 
`addRangeTableEntryForJoin()` is the last function to use this list, it 
would be more efficient to free the `List` at the point of its declaration.


I'll attach new patch where I free the lists.

Regards,

Ilia Evdokimov,

Tantor Labs LCC
From 853c5bc854bcc03e791cf32cc8cad7b257ec558f Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 10 Jun 2024 15:09:12 +0300
Subject: [PATCH] After concatenation two lists where the second one is from
 list_copy_tail do not free it

---
 src/backend/parser/analyze.c  | 2 ++
 src/backend/parser/parse_clause.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 28fed9d87f..c0708a6e3e 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1943,6 +1943,8 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
 	if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual)
 		parseCheckAggregates(pstate, qry);
 
+	list_free(targetnames);
+
 	return qry;
 }
 
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8118036495..25f8c84a9f 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1620,6 +1620,8 @@ transformFromClauseItem(ParseState *pstate, Node *n,
 		*top_nsitem = nsitem;
 		*namespace = lappend(my_namespace, nsitem);
 
+		list_free(res_colnames);
+
 		return (Node *) j;
 	}
 	else
-- 
2.34.1

Re: list_free in addRangeTableEntryForJoin

2024-06-10 Thread Ilia Evdokimov

>Now you need to analyze whether the memory in question is not managed 
by a Context

I've already analyzed. Let's explain details:


1. analyze.c
1718: List* targetnames;
1815: targetnames = NIL;
1848: targetnames = lappend(targetnames, makeString(colName));
1871: addRangeTableEntryForJoin(...);
=> list_free(targetnames);

2. parse_clause.c
1163: List* res_colnames;
1289: res_colnames = NIL;
1324: foreach(col, res_colnames);
1396: res_colnames = lappend(res_colnames, lfirst(ucol));
1520, 1525: extractRemainingColumns(...);
   |
    290: *res_colnames = lappend(*res_colnames, lfirst(lc));
1543: addRangeTableEntryForJoin(...);
=> list_free(res_colnames);


As you can see, there are no other pointers working with this block of 
memory, and all operations above are either read-only or append nodes to 
the lists. If I am mistaken, please correct me.
Furthermore, I will call `list_free` immediately after 
`addRangeTableEntryForJoin()`. The new patch is attached.


Thanks for reviewing. I'm looking forward to new suggestions.

Regards,

Ilia Evdokimov,

Tantor Labs LCC
From b545770d57ac3ca137e9ad97c004576e77213648 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 10 Jun 2024 16:39:14 +0300
Subject: [PATCH] After concatenation two lists where the second one is from
 list_copy_tail do not free it

---
 src/backend/parser/analyze.c  | 2 ++
 src/backend/parser/parse_clause.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 28fed9d87f..50c47a64ed 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1880,6 +1880,8 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
 		NULL,
 		false);
 
+   list_free(targetnames);
+
 	sv_namespace = pstate->p_namespace;
 	pstate->p_namespace = NIL;
 
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 8118036495..2fa6c03be7 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1552,6 +1552,8 @@ transformFromClauseItem(ParseState *pstate, Node *n,
 		   j->alias,
 		   true);
 
+   list_free(res_colnames);
+
 		/* Verify that we correctly predicted the join's RT index */
 		Assert(j->rtindex == nsitem->p_rtindex);
 		/* Cross-check number of columns, too */
-- 
2.34.1

Re: pg_stat_advisor extension

2024-05-13 Thread Ilia Evdokimov

1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore 

get_parallel_divisor().

2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.


The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest 
the execution of the ANALYZE command on specific tables. The extension 
now evaluates the ratio of `n_live_tup` (number of live tuples) to 
`n_mod_since_analyze` (number of modifications since last analyze) in 
the `pg_stat_all_tables` catalog. If this ratio exceeds the value 
specified in `analyze_scale_factor`, the extension will suggest an 
update to the table's statistics.


There are a lot of parameters that influences on estimated rows. 
Statistics might not help improve estimated rows. This feature is 
designed to provide users with data-driven insights to decide whether 
updating statistics via the ANALYZE command could potentially improve 
query performance. By suggesting rather than automatically executing 
statistics updates, we empower you to make informed decisions based on 
the specific needs and conditions of your database environment.


I've developed an extension that provides suggestions on whether to 
update or create statistics for your PostgreSQL database, without 
executing any changes. This approach allows you to consider various 
parameters that influence row estimates and make informed decisions 
about optimizing your database's performance.


Your feedback is invaluable, and we look forward to hearing about your 
experiences and any improvements you might suggest. Best regards, Ilia 
Evdokimov Tantor Labs LLC.
From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 13 May 2024 14:41:59 +0300
Subject: [PATCH] 'pg_stat_advisor' extension

This service as a hook into executor. It has two GUC-parameters.

pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command.

pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns'
---
 contrib/Makefile  |   1 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 .../expected/pg_stat_advisor.out  |  52 ++
 contrib/pg_stat_advisor/meson.build   |  30 +
 contrib/pg_stat_advisor/pg_stat_advisor.c | 560 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  24 +
 6 files changed, 687 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..d6ce2fe562 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
new file mode 100644
index 00..8f3dab2c2f
--- /dev/null
+++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
@@ -0,0 +1,52 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,
+ i/500 as fld_2
+FROM generate_series(1, 1000) s(i);
+ANALYZE my_tbl;
+INSERT

Adding NOTICE for differences between estimated and actual rows

2024-09-05 Thread Ilia Evdokimov


Hi hackers,

It's quite common that poor query performance can be attributed to 
inaccurate row estimations by the planner. To make it easier to detect 
these discrepancies, rather than scrutinizing the estimates manually, it 
would be helpful to output a dedicated |NOTICE| message.


In the current patch, I've introduced a new GUC parameter called 
'estimated_rows_scale_factor'. If the ratio of the estimated rows to the 
actual rows is less than this factor, or if the estimated rows 
significantly exceed the actual rows (when the ratio is greater than 
this factor), a NOTICE message will be printed. The message reads: 
"Estimated rows (%.0f) less(greater) than actual rows (%.0f)."



Here is an example:

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, x FROM generate_series(1,1000) g(x);
ANALYZE;

SET estimated_rows_scale_factor = 0.9;

EXPLAIN ANALYZE SELECT * FROM t WHERE a > 10 AND b <= 200;
NOTICE:  Estimated rows (1000) greater than actual rows (91).
 QUERY PLAN
-
 Gather  (cost=1000.00..107848.00 rows=1000 width=8) (actual 
time=0.446..122.476 rows=91 loops=1)

   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..106748.00 rows=417 width=8) 
(actual time=77.657..118.000 rows=30 loops=3)

 Filter: ((a > 10) AND (b <= 200))
 Rows Removed by Filter: 303
 Planning Time: 0.097 ms
 Execution Time: 122.502 ms
(8 rows)

EXPLAIN ANALYZE SELECT * FROM t WHERE a = 10 AND b <= 200;
NOTICE:  Estimated rows (1) less than actual rows (10).
    QUERY PLAN
--
 Gather  (cost=1000.00..107748.10 rows=1 width=8) (actual 
time=0.280..104.752 rows=10 loops=1)

   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..106748.00 rows=1 width=8) 
(actual time=66.493..101.102 rows=3 loops=3)

 Filter: ((b <= 200) AND (a = 10))
 Rows Removed by Filter: 330
 Planning Time: 0.129 ms
 Execution Time: 104.768 ms
(8 rows)


If you have any suggestions regarding the wording of the message, its 
placement, or if you'd like to see a different criterion used, I would 
greatly appreciate your feedback.


Looking forward to your thoughts and suggestions.

--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
From 8e038f16d45d5f8c39c7204fc661ed4b7f504a63 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 5 Sep 2024 18:15:01 +0300
Subject: [PATCH v1] Add message when estimated rows differ from actual rows

If the ratio of estimated rows to actual rows is less
than the 'estimated_rows_scale_factor', or if the estimated rows
exceed the actual rows, a NOTICE message will be printed.
This helps to identify discrepancies between the planner's estimations
and the actual query execution, enabling better analysis and optimization.
---
 src/backend/commands/explain.c  | 31 +
 src/backend/utils/misc/guc_tables.c | 11 ++
 src/include/utils/guc.h |  1 +
 3 files changed, 43 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 11df4a04d4..da435cfee1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -419,6 +419,35 @@ ExplainResultDesc(ExplainStmt *stmt)
 	return tupdesc;
 }
 
+/*
+ * Notice whether estimated rows is bad.
+ */
+static void
+ExplainPrintEstimatedRowsAttention(ExplainState *es, QueryDesc *queryDesc)
+{
+	double rows;
+	double nloops;
+	PlanState *planstate = queryDesc->planstate;
+
+	if (es->analyze &&
+		planstate->instrument && planstate->instrument->nloops > 0)
+	{
+		nloops = planstate->instrument->nloops;
+		rows   = planstate->instrument->ntuples / nloops;
+
+		if(queryDesc->planstate->plan->plan_rows / rows < estimated_rows_scale_factor)
+			ereport(NOTICE,
+	(errmsg("Estimated rows (%.0f) less than actual rows (%.0f).",
+queryDesc->planstate->plan->plan_rows,
+rows), errhidestmt(true)));
+		else if (rows / queryDesc->planstate->plan->plan_rows < estimated_rows_scale_factor)
+			ereport(NOTICE,
+	(errmsg("Estimated rows (%.0f) greater than actual rows (%.0f).",
+queryDesc->planstate->plan->plan_rows,
+rows), errhidestmt(true)));
+	}
+}
+
 /*
  * ExplainOneQuery -
  *	  print out the execution plan for one Query
@@ -721,6 +750,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	/* Create textual dump of plan tree */
 	ExplainPrintPlan(es, queryDesc);
 
+	ExplainPrintEstimatedRowsAttention(es, queryDesc);
+

Re: Adding NOTICE for differences between estimated and actual rows

2024-09-05 Thread Ilia Evdokimov



On 05.09.2024 23:32, Euler Taveira wrote:

On Thu, Sep 5, 2024, at 1:05 PM, Ilia Evdokimov wrote:


It's quite common that poor query performance can be attributed to 
inaccurate row estimations by the planner. To make it easier to 
detect these discrepancies, rather than scrutinizing the estimates 
manually, it would be helpful to output a dedicated |NOTICE| message.




I don't know if NOTICE is a good UI for an inaccurate estimation. The 
main issue
with your proposal is that it does not indicate where it is. It is 
easier to
inspect small query plans but what if you have a plan with hundreds of 
lines?


IMO the client should provide this feature. The shell provides a way 
to change

the color and/or style from the output. I have a perl script that reads an
EXPLAIN output and mark with different colors (red, yellow) if the 
estimations

are off. psql could do the same.

In your case if the output was changed to something like:

\033[0;1;31mGather  (cost=1000.00..107848.00 rows=1000 width=8) 
(actual time=0.446..122.476 rows=91 loops=1)\033[0m

   Workers Planned: 2
   Workers Launched: 2
   ->  \033[0;1;31mParallel Seq Scan on t (cost=0.00..106748.00 
rows=417 width=8) (actual time=77.657..118.000 rows=30 loops=3)\033[0m

 Filter: ((a > 10) AND (b <= 200))
 Rows Removed by Filter: 303
Planning Time: 0.097 ms
Execution Time: 122.502 ms
(8 rows)

Note "\033[0;1;31m" and "\033[0m" that means foreground bold red and 
default,

respectively.

Another alternative if you don't want to modify psql is to use the 
pager. Create
a script that contains your logic to apply color and/or style to the 
desired
(sub)string(s). The following example that I extracted from [1] can 
apply colors

to psql output.

$ cat /tmp/pcc.pl
#!/usr/bin/perl -n
print "\033[1m\033[35m$1\033[36m$2\033[32m$3\033[33m$4\033[m" while 
/([|+-]+)|([0-9]+)|([a-zA-Z_]+)|([^\w])/g;


and then you can start psql as:

$ PAGER="/c/mypager.pl" psql


[1] 
https://stackoverflow.com/questions/5947742/how-to-change-the-output-color-of-echo-in-linux/28938235#28938235



--
Euler Taveira
EDB https://www.enterprisedb.com/



Yes, you are right. It probably doesn't make sense to implement such a 
notification on the server side. It makes more sense to handle this on 
the client side, where there are many different tools, including your 
suggestion, to highlight inaccurate estimates.


Thank you very much for the review!

--

Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: Add support for (Var op Var) clause in extended MCV statistics

2024-09-09 Thread Ilia Evdokimov


Hi everyone,

I've taken a closer look at the patch and realized that we don't need 
the new function 'mcv_clause_selectivity_var_op_var()' we can use 
'mcv_clause_selectivity()' instead.


I'm attaching the updated patch and test generator.

--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
#!/usr/bin/python3

import psycopg2
import random
import select
import hashlib

def generate_conditions(nclauses, attributes = ['a', 'b', 'c', 'd'], operators = ['<', '<=', '=', '!=', '>=', '>']):

	if nclauses == 1:
		cols = [random.choice(attributes), random.choice(attributes)]
		oper = ' ' + random.choice(operators) + ' '

		clause = oper.join(cols)

		if random.randint(0,100) < 50:
			clause = 'NOT ' + clause

		return clause


	nparts = random.randint(2, nclauses)

	# distribute the clauses between query parts
	nclauses_parts = [1 for p in range(0, nparts)]

	for x in range(0, nclauses - nparts):
		nclauses_parts[random.randint(0, nparts) - 1] += 1

	parts = []
	for p in range(0, nparts):
		parts.append('(' + generate_conditions(nclauses_parts[p], attributes, operators) + ')')

	c = random.choice([' AND ', ' OR '])

	return c.join(parts)

def generate_data(nrows, attributes = ['a', 'b', 'c', 'd']):

	sql = 'insert into t (' + ','.join(attributes) + ') select '

	attrs = []
	for attr in attributes:

		x = random.choice([-1, 2, 5, 10, 20, 30])

		if x == -1:
			x = random.randint(5, 20)
			expr = '(random() * ' + str(x) + ')::int'
		else:
			expr = 'mod(i,' + str(x) + ')'

		if random.randint(0,100) < 50:
			x = random.choice([2, 5, 10, 20, 30])
			attrs.append('case when mod(i,' + str(x) + ') = 0 then null else ' + expr + ' end')
		else:
			attrs.append(expr)

	sql += ', '.join(attrs) + ' from generate_series(1,' + str(nrows) + ') s(i)'

	return sql

def wait(conn):

	while True:
		state = conn.poll()
		if state == psycopg2.extensions.POLL_OK:
			break
		elif state == psycopg2.extensions.POLL_WRITE:
			select.select([], [conn.fileno()], [])
		elif state == psycopg2.extensions.POLL_READ:
			select.select([conn.fileno()], [], [])
		else:
			raise psycopg2.OperationalError("poll() returned %s" % state)

def run_everywhere(conns, queries):

	curs = [conn.cursor() for conn in conns]

	for q in queries:
		[cur.execute(q) for cur in curs]
		[wait(conn) for conn in conns]

	[cur.close() for cur in curs]

if __name__ == '__main__':

	conns = [
		psycopg2.connect('host=localhost port=5001 user=postgres dbname=postgres', async_=True),
		psycopg2.connect('host=localhost port=5002 user=postgres dbname=postgres', async_=True),
		psycopg2.connect('host=localhost port=5003 user=postgres dbname=postgres', async_=True),
		psycopg2.connect('host=localhost port=5004 user=postgres dbname=postgres', async_=True)]

	[wait(conn) for conn in conns]

	curs = [conn.cursor() for conn in conns]

	# 100 data sets
	for cnt in [3, 10, 100]:

		for d in range(1,100):

			# generate the data on all versions
			data_sql = generate_data(cnt)
			data_md5 = hashlib.md5(data_sql.encode('utf-8')).hexdigest()

			with open('data.csv', 'a') as f:
f.write('%s\t%s\n' % (data_md5, data_sql))

			run_everywhere(conns, ['drop table if exists t', 'create table t (a int, b int, c int, d int)', data_sql, 'commit', 'analyze'])

			# generate the clauses
			conditions = []
			for c in range(1, 6):
for q in range(1,100):
	conditions.append({'clauses' : c, 'conditions' : generate_conditions(c)})

			with open('results.csv', 'a') as f:
for conds in conditions:
	sql = "select * from check_estimated_rows('select * from t where " + conds['conditions'] + "')"

	[cur.execute(sql) for cur in curs]
	[wait(conn) for conn in conns]
	r = [cur.fetchone() for cur in curs]

	actual_rows = r[0][1]
	estimated_rows = [str(x[0]) for x in r]

	f.write(('%s\t%s\t%s\t%s\t%s\t%s\t%s\n') % (data_md5, cnt, conds['clauses'], conds['conditions'], 'no', actual_rows, '\t'.join(estimated_rows)))

			run_everywhere(conns, ['create statistics s (mcv) on a, b, c, d from t', 'commit', 'analyze'])

			with open('results.csv', 'a') as f:
for conds in conditions:
	sql = "select * from check_estimated_rows('select * from t where " + conds['conditions'] + "')"

	[cur.execute(sql) for cur in curs]

Re: refactor the CopyOneRowTo

2024-07-31 Thread Ilia Evdokimov


Hi, hackers

I'm sure this refactoring is useful because it's more readable when 
datum value is binary or not.


However, I can see a little improvement. We can declare variable 'bytea 
*outputbytes' in 'else' because variable is used nowhere except this place.



Regards,

Ilia Evdokimov,

Tantor Labs LLC.

Re: Vacuum statistics

2024-08-10 Thread Ilia Evdokimov


Hello, everyone!


Thank you for your interesting patch with extended information 
statistics about autovacuum.


Do you consider not to create new table in pg_catalog but to save 
statistics in existing table? I mean pg_class or 
pg_stat_progress_analyze, pg_stat_progress_vacuum?


P.S. If I sent this mail twice, I'm sorry :)


Regards

Ilia Evdokimov,

Tantor Labs.

Add support for (Var op Var) clause in extended MCV statistics

2024-08-12 Thread Ilia Evdokimov


Hi hackers,

I'd like to submit a patch that improves the estimated rows for queries 
containing (Var op Var) clauses by applying extended MCV statistics.


*New functions:*

 * mcv_clauselist_selectivity_var_op_var() - calculates the selectivity
   for (Var op Var) clauses.
 * is_opclause_var_op_var() - Checks whether a clause is of the (Var op
   Var) form.

*Implementation Details:*

 * A new 'if' statement was added to the 'clause_selectivity_ext()'
   function to handle (Var op Var) clauses. This allows the process to
   locate matching MCV extended statistics and calculate selectivity
   using the newly introduced function.
 * Additionally, I added 'if' statement
   in statext_is_compatible_clause_internal() function to determine
   which columns are included in the clause, find matching extended
   statistics, and then calculate selectivity through the new function.
   I did the same in mcv_get_match_bitmap() to check what values are
   true for (Var op Var).
 * To support this, I created a new enum type to differentiate between
   OR/AND and (Var op Var) clauses.

*Examples:*

create table t (a int, b int);
insert into t select mod(i,10), mod(i,10)+1 from 
generate_series(1,10) s(i);

analyze t;
explain select * from t where a < b;
`
    Estimated:   3
    Actual:   10

explain select * from t where a > b;
`
    Estimated:   3
    Actual:   10

create statistics s (mcv) on a,b from t;
analyze t;
explain select * from t where a < b;
`
    Estimated without patch:  3
    Estimated with patch: 10
    Actual: 10

explain select * from t where a > b;
`
    Estimated without patch:  3
    Estimated with patch: 10
    Actual: 10


If you want to see more examples, see regress tests in the patch.

*Previous thread:*

This feature was originally developed two years ago in [1], and at that 
time, the approach was almost the same. My implementation uses dedicated 
functions and 'if' statements directly for better readability and 
maintainability. Additionally, there was a bug in the previous approach 
that has been resolved with my patch. Here’s an example of the bug and 
its fix:


CREATE TABLE foo (a int, b int);
INSERT INTO foo SELECT x/10+1, x FROM generate_series(1,1) g(x);
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
    Estimated:   18
    Actual:   9

CREATE STATISTICS foo_s (mcv) ON a,b FROM foo;
ANALYZE foo;
EXPLAIN ANALYZE SELECT * FROM foo WHERE a = 1 OR (b > 0 AND b < 10);
`
    Estimated previous patch:  18
    Estimated current patch:  9
    Actual:      9


[1]: 
https://www.postgresql.org/message-id/flat/9e0a12e0-c05f-b193-ed3d-fe88bc1e5fe1%40enterprisedb.com


I look forward to any feedback or suggestions from the community.

Best regars,
Ilia Evdokimov
Tantor Labs LLC.
From fd468972f0ce27291523a28fdf0d9966c2fdf6e1 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sat, 10 Aug 2024 14:35:25 +0300
Subject: [PATCH] Add support for (Var op Var) clause in extended MCV
 statistics

Added a new leaf to the existing clauses tree, allowing the calculation
of selectivity specifically for (Var op Var) clauses. The new function
for this selectivity calculation has been integratedinto
the extended statistics mechanism, ensuring accurate application
during query planning.
---
 src/backend/optimizer/path/clausesel.c|  25 +-
 src/backend/statistics/README |   6 +-
 src/backend/statistics/README.mcv |   6 +-
 src/backend/statistics/extended_stats.c   | 108 +++-
 src/backend/statistics/mcv.c  | 184 --
 .../statistics/extended_stats_internal.h  |   6 +
 src/include/statistics/statistics.h   |  11 +-
 src/test/regress/expected/stats_ext.out   | 553 ++
 src/test/regress/sql/stats_ext.sql| 166 ++
 9 files changed, 982 insertions(+), 83 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..eec6c6182c 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -152,7 +152,7 @@ clauselist_selectivity_ext(PlannerInfo *root,
 		 */
 		s1 = statext_clauselist_selectivity(root, clauses, varRelid,
 			jointype, sjinfo, rel,
-			&estimatedclauses, false);
+			&estimatedclauses, AND_TYPE);
 	}
 
 	/*
@@ -384,7 +384,7 @@ clauselist_selectivity_or(PlannerInfo *root,
 		 */
 		s1 = statext_clauselist_selectivity(root, clauses, varRelid,
 			jointype, sjinfo, rel,
-			&estimatedclauses, true);
+			&estimatedclauses, OR_TYPE);
 	}
 
 	/*
@@ -691,6 +691,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictI

Re: Add support for (Var op Var) clause in extended MCV statistics

2024-08-12 Thread Ilia Evdokimov

Another issue mentioned in [1] involves cases where the clause is in the 
form (A op A). In my view, this isn't related to the current patch, as 
it can be addressed by rewriting the clause, similar to transforming A = 
A into A IS NOT NULL. This adjustment would result in more accurate 
estimation.


[1]: 
https://www.postgresql.org/message-id/7C0F91B5-8A43-428B-8D31-556458720305%40enterprisedb.com


Ilia Evdokimov,
Tantor Labs LLC.

Re: Add support for (Var op Var) clause in extended MCV statistics

2024-08-12 Thread Ilia Evdokimov


On 12.8.24 14:53, Tomas Vondra wrote:


I agree, and I'm grateful someone picked up the original patch. I'll try
to help to keep it moving forward. If the thread gets stuck, feel free
to ping me to take a look.

Good. Thank you!

I started reviewing it and want to suggest some changes to better code:
I think we should consider the case where the expression is not neither
an OpExpr and VarOpVar expression.


Do you have some specific type of clauses in mind? Most of the extended
statistics only really handles this type of clauses, so I'm not sure
it's feasible to extend that - at least not in this patch.


I agree with Alena that we need to consider the following clauses: (Expr 
op Var), (Var op Expr) and (Expr op Expr). And we need to return false 
in these cases because we did it before my patch in


    /* Check if the expression has the right shape */
    if (!examine_opclause_args(expr->args, &clause_expr, NULL, NULL))
        return false;

In is_opclause_var_op_var() function it is really useless local Node 
*expr_left, *expr_right variables. However, we can't assign them NULL at 
the begin because if I passed not-null pointers I have to return the 
values. Otherwise remain them NULL.


Nevertheless, thank you for review, Alena.


Have you tested this code with any benchmarks?


FWIW I think we need to test two things - that it (a) improves the
estimates and (b) does not have significant overhead.
Yes, but only TPC-B. And the performance did not drop. In general, it'd 
be better to do more tests and those listed by Tomas with new attached 
patch.From fa67b0fa34408c0f1b0c9f079b84e7c71f3b5599 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sat, 10 Aug 2024 14:35:25 +0300
Subject: [PATCH] Add support for (Var op Var) clause in extended MCV
 statistics

Added a new leaf to the existing clauses tree, allowing the calculation
of selectivity specifically for (Var op Var) clauses. The new function
for this selectivity calculation has been integratedinto
the extended statistics mechanism, ensuring accurate application
during query planning.
---
 src/backend/optimizer/path/clausesel.c|  25 +-
 src/backend/statistics/README |   6 +-
 src/backend/statistics/README.mcv |   6 +-
 src/backend/statistics/extended_stats.c   | 108 +++-
 src/backend/statistics/mcv.c  | 186 --
 .../statistics/extended_stats_internal.h  |   6 +
 src/include/statistics/statistics.h   |  11 +-
 src/test/regress/expected/stats_ext.out   | 553 ++
 src/test/regress/sql/stats_ext.sql| 166 ++
 9 files changed, 983 insertions(+), 84 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..cb39bd 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -152,7 +152,7 @@ clauselist_selectivity_ext(PlannerInfo *root,
 		 */
 		s1 = statext_clauselist_selectivity(root, clauses, varRelid,
 			jointype, sjinfo, rel,
-			&estimatedclauses, false);
+			&estimatedclauses, AND_CLAUSE);
 	}
 
 	/*
@@ -384,7 +384,7 @@ clauselist_selectivity_or(PlannerInfo *root,
 		 */
 		s1 = statext_clauselist_selectivity(root, clauses, varRelid,
 			jointype, sjinfo, rel,
-			&estimatedclauses, true);
+			&estimatedclauses, OR_CLAUSE);
 	}
 
 	/*
@@ -691,6 +691,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	Selectivity s1 = 0.5;		/* default for any unhandled clause type */
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
+	Node	   *src = clause;
 
 	if (clause == NULL)			/* can this still happen? */
 		return s1;
@@ -832,6 +833,7 @@ clause_selectivity_ext(PlannerInfo *root,
 	{
 		OpExpr	   *opclause = (OpExpr *) clause;
 		Oid			opno = opclause->opno;
+		List	   *clauses = list_make1(src);
 
 		if (treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo))
 		{
@@ -842,6 +844,25 @@ clause_selectivity_ext(PlannerInfo *root,
   jointype,
   sjinfo);
 		}
+		else if(use_extended_stats)
+		{
+			/* Check whether clauses are from one relation  */
+			RelOptInfo *rel = find_single_rel_for_clauses(root, clauses);
+			Bitmapset  *estimatedclauses = NULL;
+			if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+s1 = statext_clauselist_selectivity(root, clauses, varRelid,
+		jointype, sjinfo, rel,
+		&estimatedclauses, VAR_OP_VAR_CLAUSE);
+
+			if (bms_num_members(estimatedclauses) != 1)
+			{
+/* If there is no multi-column MCV statistics */
+s1 = restriction_selectivity(root, opno,
+			opclause->args,
+			opclause->inputcollid,
+			varRelid);
+			}
+		}
 		else
 		{
 			/* Estimate selectivity for a restriction clause. */
diff --git a/src/backend/statistics/README b/src/backend/statistics/README
index 13a97a

Re: change regexp_substr first argument make tests more easier to understand.

2024-08-13 Thread Ilia Evdokimov


Hi everybody

Current tests with regexp_instr() and regexp_substr()  with string 
'abcabcabc' are really unreadable and you would spend time to understand 
that happens in these tests and if they are really correct. I'd better 
change them into "abcdefghi" just like in query


    SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;

Regards

Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

2024-08-13 Thread Ilia Evdokimov


On 10.8.24 22:37, Andrei Zubkov wrote:


Hi, Ilia!


Do you consider not to create new table in pg_catalog but to save
statistics in existing table? I mean pg_class or
pg_stat_progress_analyze, pg_stat_progress_vacuum?


Thank you for your interest on our patch!

*_progress views is not our case. They hold online statistics while
vacuum is in progress. Once work is done on a table the entry is gone
from those views. Idea of this patch is the opposite - it doesn't
provide online statistics but it accumulates statistics about rosources
consumed by all vacuum passes over all relations. It's much closer to
the pg_stat_all_tables than pg_stat_progress_vacuum.

It seems pg_class is not the right place because it is not a statistic
view - it holds the current relation state and haven't anything about
the relation workload.

Maybe the pg_stat_all_tables is the right place but I have several
thoughts about why it is not:
- Some statistics provided by this patch is really vacuum specific. I
don't think we want them in the relation statistics view.
- Postgres is extreamly extensible. I'm sure someday there will be
table AMs that does not need the vacuum at all.

Right now vacuum specific workload views seems optimal choice to me.

Regards,



Agreed. They are not god places to store such statistics.


I have some suggestions:

1. pgstatfuncs.c in functions tuplestore_put_for_database() and
   tuplestore_put_for_relation you can remove 'nulls' array if you're
   sure that columns cannot be NULL.
2. These functions are almost the same and I would think of writing one
   function depending of type 'ExtVacReportType'

Re: Vacuum statistics

2024-08-13 Thread Ilia Evdokimov

And I have one suggestion for pg_stat_vacuum_database: I suppose we 
should add database's name column after 'dboid' column because it is 
difficult to read statistics without database's name. We could call it 
'datname' just like in 'pg_stat_database' view.


Regards,

Ilia Evdokimov,
Tantor Labs LCC.

Re: Vacuum statistics

2024-08-15 Thread Ilia Evdokimov



On 15.8.24 11:49, Alena Rybakina wrote:


I have some suggestions:

 1. pgstatfuncs.c in functions tuplestore_put_for_database() and
tuplestore_put_for_relation you can remove 'nulls' array if
you're sure that columns cannot be NULL.

We need to use this for tuplestore_putvalues function. With this 
function, we fill the table with the values of the statistics.


Ah, right! I'm sorry.



1.



 2. These functions are almost the same and I would think of writing
one function depending of type 'ExtVacReportType'

I'm not sure that I fully understand what you mean. Can you explain it 
more clearly, please?



Ah, I didn't notice that the size of all three tables is different. 
Therefore, it won't be possible to write one function instead of two to 
avoid code duplication. My mistake.

Remove redundant NULL check in clause_selectivity_ext() function

2024-08-19 Thread Ilia Evdokimov


Hi everyone,


In the `clause_selectivity_ext()` function, there’s a comment regarding 
a NULL clause check that asks, "can this still happen?". I decided to 
investigate whether this scenario is still valid.


Here’s what I found after reviewing the relevant cases:

- approx_tuple_count(): The function iterates over a loop of other clauses.
- get_foreign_key_join_selectivity(): The function is invoked within an 
`if (clause)` condition.
- consider_new_or_clause(): The clause is generated by 
`make_restrictinfo()`, which never returns NULL.
- clauselist_selectivity_ext(): This function either checks 
`list_length(clauses) == 1` before being called, or it is called within 
a loop of clauses.


In other cases, the function is called recursively from 
`clause_selectivity_ext()`.


If you are aware of any situations where a NULL clause could be passed 
or if I’ve missed anything, please let me know. I’m also open to any 
other suggestions.


--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
From fab0575ef1350cb700b6fc079230397ecb5ca19d Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 19 Aug 2024 12:08:53 +0300
Subject: [PATCH] Remove redundant NULL check for clause during selectivity
 calculation

The function clause_selectivity_ext() takes a Node *clause parameter,
which is either iterated over in a loop with other clauses or passed
as a RestrictInfo from other functions.
Since these functions guarantee that clause cannot be NULL,
the NULL check is unnecessary and has been removed.
---
 src/backend/optimizer/path/clausesel.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 0ab021c1e8..5992f96ed2 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -692,9 +692,6 @@ clause_selectivity_ext(PlannerInfo *root,
 	RestrictInfo *rinfo = NULL;
 	bool		cacheable = false;
 
-	if (clause == NULL)			/* can this still happen? */
-		return s1;
-
 	if (IsA(clause, RestrictInfo))
 	{
 		rinfo = (RestrictInfo *) clause;
-- 
2.34.1

Re: Remove redundant NULL check in clause_selectivity_ext() function

2024-08-19 Thread Ilia Evdokimov



On 19.8.24 18:02, Tom Lane wrote:

Ilia Evdokimov  writes:

In the `clause_selectivity_ext()` function, there’s a comment regarding
a NULL clause check that asks, "can this still happen?". I decided to
investigate whether this scenario is still valid.
Here’s what I found after reviewing the relevant cases:
- approx_tuple_count(): The function iterates over a loop of other clauses.
- get_foreign_key_join_selectivity(): The function is invoked within an
`if (clause)` condition.
- consider_new_or_clause(): The clause is generated by
`make_restrictinfo()`, which never returns NULL.
- clauselist_selectivity_ext(): This function either checks
`list_length(clauses) == 1` before being called, or it is called within
a loop of clauses.

That list_length check doesn't prove anything about whether the list
element is NULL, though.

While I suspect that you may be right that the case doesn't occur
anymore (if it ever did), I'm inclined to leave this test in place.
It's cheap enough compared to what the rest of the function will do,
and more importantly we can't assume that all interesting call sites
are within Postgres core.  There are definitely extensions calling
clauselist_selectivity and related functions.  It's possible that
some of them rely on clause_selectivity not crashing on a NULL.
Certainly, such an assumption could be argued to be a bug they
should fix; but I'm disinclined to make them jump through that
hoop for a vanishingly small performance improvement.

Also, there are boatloads of other places where the planner has
possibly-redundant checks for null clause pointers.  It's likely
that some of the other ones are more performance-critical than
this.  But I wouldn't be in favor of retail removal of the others
either.  Maybe with a more holistic approach we could remove a
whole lot of them and make a measurable improvement; but it
would require some careful thought about what invariants we
want to assume.  There's not really any design principles
about this right now, and where we've ended up is that most
functions operating on expression trees assume they've got to
defend against NULL inputs.  To remove those checks, we'd
need a clear understanding of where caller-side checks need
to be added instead.

regards, tom lane


Let's assume that this check needs to remain, and the length check 
doesn't guarantee anything. However, I'm a bit concerned that there's a 
NULL check here, but it's missing in the |clauselist_selectivity_ext()| 
function. For the reasons mentioned above, I would suggest the 
following: either we perform the NULL check in both places, or we don't 
perform it in either.


--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: Add support for (Var op Var) clause in extended MCV statistics

2024-08-19 Thread Ilia Evdokimov


On 12.8.24 19:25, Tomas Vondra wrote:


Is TPC-B really interesting/useful for this patch? The queries are super
simple, with only a single clause (so it may not even get to the code
handling extended statistics). Did you create any extended stats?


No, it's not the case. I simply wanted to verify that other queries are 
not slowed down after applying my patch.


I think you'll need to construct a custom test, with queries that have
multiple (var op var) clauses, extended stats created, etc. And
benchmark that.



I used the test generator from a previous thread [1] and ran it with 
|default_statistics_target = 1000| to achieve more accurate estimates 
for 3000 rows. It would also be beneficial to run tests with 10,000 and 
100,000 rows for a broader perspective. I've attached the python test. 
Here’s a breakdown of the issues:


1. (A op A) Clause: Before applying my patch, there were poor estimates
   for expressions like |(A op A)|. Currently, we only have correct
   estimates for the |(A = A)| clause, which transforms into |A IS NOT
   NULL|. Should I address this in this thread? I believe we should
   extend the same correction to clauses like |(A != A)|, |(A < A)|,
   and similar conditions. However, this issue is not for current thread.
2. AND Clauses: The estimates for AND clauses were inaccurate before my
   patch. I noticed code segments where I could add something specific
   for the |(Var op Var)| clause, but I'm unsure if I'm missing
   anything crucial. If my understanding is incorrect, I'd appreciate
   any guidance or corrections.


FWIW I don't think it makes sense to benchmark the query execution - if
the estimate improves, it's possible to get arbitrary speedup, but
that's expected and mostly mostly irrelevant I think.

What I'd focus on is benchmarking just the query planning - we need the
overhead to be negligible (or at least small) so that it does not hurt
people with already good plans.

BTW can you elaborate why you are interested in this patch? Do you just
think it's interesting/useful, or do you have a workload where it would
actually help? I'm asking because me being uncertain how beneficial this
is in practice (not just being nice in theory) was one of the reasons
why I didn't do more work on this in 2021.


regards



I have two reasons for pursuing this. Firstly, I've encountered some of 
these queries in practice, although they are quite rare. While it might 
be easy to dismiss these cases due to their infrequency, I believe that 
we shouldn't overlook the opportunity to develop better handling for 
them, regardless of how seldom they occur.


Secondly, I see that you're working on improving estimates for JOIN 
clauses in thread [2]. I believe that enhancing estimates for these rare 
cases could also benefit future work on JOIN queries, particularly those 
with multiple |ON (T1.column = T2.column)| conditions, which are 
essentially |(Var op Var)| clauses. My idea is to start with non-JOIN 
queries, and then apply the same approach to improve JOIN estimates. Of 
course, I might be wrong, but I think this approach has potential.



P.S. If I sent this mail twice I'm sorry. I wanted to sent results of 
the test, and it was not sent to hackers because of big size of attached 
file. Now I sent only test.


[1]: 
https://www.postgresql.org/message-id/ecc0b08a-518d-7ad6-17ed-a5e962fc4f5f%40enterprisedb.com


[2]: 
https://www.postgresql.org/message-id/flat/c8c0ff31-3a8a-7562-bbd3-78b2ec65f16c%40enterprisedb.com 



--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
#!/usr/bin/python3

import psycopg2
import random
import select
import hashlib

def generate_conditions(nclauses, attributes = ['a', 'b', 'c', 'd'], operators = ['<', '<=', '=', '!=', '>=', '>']):

	if nclauses == 1:
		cols = [random.choice(attributes), random.choice(attributes)]
		oper = ' ' + random.choice(operators) + ' '

		clause = oper.join(cols)

		if random.randint(0,100) < 50:
			clause = 'NOT ' + clause

		return clause


	nparts = random.randint(2, nclauses)

	# distribute the clauses between query parts
	nclauses_parts = [1 for p in range(0, nparts)]

	for x in range(0, nclauses - nparts):
		nclauses_parts[random.randint(0, nparts) - 1] += 1

	parts = []
	for p in range(0, nparts):
		parts.append('(' + generate_conditions(nclauses_parts[p], attributes, operators) + ')')

	c = random.choice([' AND ', ' OR '])

	return c.join(parts)

def generate_data(nrows, attributes = ['a', 'b', 'c', 'd']):

	sql = 'insert into t (' + ','.join(attributes) + ') select '

	attrs = []
	for attr in attributes:

		x = random.choice([-1, 2, 5, 10, 20, 30])

		if x == -1:
			x = random.randint(5,

Re: Vacuum statistics

2024-08-19 Thread Ilia Evdokimov

Are you certain that all tables are included in `pg_stat_vacuum_tables`? 
I'm asking because of the following:



SELECT count(*) FROM pg_stat_all_tables ;
 count
---
   108
(1 row)

SELECT count(*) FROM pg_stat_vacuum_tables ;
 count
---
    20
(1 row)

--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: ANALYZE ONLY

2024-08-20 Thread Ilia Evdokimov


On 20.8.24 10:42, Jelte Fennema-Nio wrote:


On Tue, 20 Aug 2024 at 07:52, Michael Harris  wrote:

   1. Would such a feature be welcomed? Are there any traps I might not
have thought of?

I think this sounds like a reasonable feature.



   2. The existing ANALYZE command has the following structure:

  ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

  It would be easiest to add ONLY as another option, but that
doesn't look quite
  right to me - surely the ONLY should be attached to the table name?
  An alternative would be:

  ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ]

Any feedback or advice would be great.

Personally I'd prefer a new option to be added. But I agree ONLY isn't
a good name then. Maybe something like SKIP_PARTITIONS.



Hi everyone,

Your proposal is indeed interesting, but I have a question: can't your 
issue be resolved by properly configuring |autovacuum| instead of 
developing a new feature for |ANALYZE|?


From my perspective, |ANALYZE| is intended to forcibly gather 
statistics from all partitioned tables. If the goal is to ensure that 
statistics are updated at the right moment, adjusting the 
|autovacuum_analyze_threshold| and |autovacuum_analyze_scale_factor| 
parameters might be the solution.


--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Re: Use function smgrclose() to replace the loop

2024-10-14 Thread Ilia Evdokimov




On 14.08.2024 09:32, Steven Niu wrote:

Hi, Kirill, Junwang,

I made this patch to address the refactor issue in our previous email 
discussion.
https://www.postgresql.org/message-id/flat/CABBtG=cdtcbdcbk7mcsy6bjr3s5xutog0vsffuw8olduyyc...@mail.gmail.com 



That is, the for loop in function smgrdestroy() and smgrdounlinkall 
can be replaced with smgrclose().


for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
    smgrsw[reln->smgr_which].smgr_close(reln, forknum);
-->
smgrclose(rels[i]);

Please let me know if you have any questions.

Best Regards,
Steven from Highgo.com


Hello,

Are you sure we can refactor loop by 'smgrclose()'? I see it is not 
quite the same.


Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

2024-10-16 Thread Ilia Evdokimov



On 08.10.2024 19:18, Alena Rybakina wrote:

Made a rebase on a fresh master branch.
--
Regards,
Alena Rybakina
Postgres Professional


Thank you for rebasing.

I have noticed that when I create a table or an index on this table, 
there is no information about the table or index in 
pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM.


Example:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM  GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';

(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)

I can see the entries after running VACUUM or executing autovacuum. or 
when autovacuum is executed. I would suggest adding a line about the 
relation even if it has not yet been processed by vacuum. Interestingly, 
this issue does not occur with pg_stat_vacuum_database:


CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid |   dbname | ...
 ...  | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to 
pg_stat_vacuum_database_S_  for consistency with pg_stat_vacuum_tables 
and pg_stat_vacuum_indexes


--
Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Check for tuplestorestate nullness before dereferencing

2024-10-14 Thread Ilia Evdokimov




On 14.10.2024 12:25, Alexander Kuznetsov wrote:

Hello everyone,

I'd like to propose adding a check for the nullness of tuplestorestate 
before dereferencing it

in src/backend/executor/nodeModifier.c. The patch is attached.

I am proposing this fix based on the assumption that tuplestorestate 
could be NULL

since there is a check for it when calculating eof_tuplestore at line 85.
However, since this code hasn't been changed since 2006 and hasn't 
caused any issues,
it is possible that the check for (tuplestorestate == NULL) is 
redundant when calculating eof_tuplestore.




Hi Alexander,

The 'tuplestorestate' variable may be initialized at line 64 if it is 
NULL. You should consider initializing this variable earlier.


Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

2024-11-10 Thread Ilia Evdokimov



On 08.11.2024 22:23, Alena Rybakina wrote:

Hi! Thank you for review!

On 07.11.2024 17:49, Ilia Evdokimov wrote:


Thank you for fixing it.

1) I have found some typos in the test output files (out-files) when 
running 'make check' and 'make check-world'. These typos might cause 
minor discrepancies in test results. You may already be aware of 
them, but I wanted to bring them to your attention in case they 
haven't been noticed. I believe these can be fixed quickly.



Yes, I'll fix it)


2) Additionally, I observed that when we create a table and insert 
some rows, executing the VACUUM FULL command does not update the 
information in the 'pg_stat_get_vacuum_tables' However, running the 
VACUUM command does update this information as expected. This seems 
inconsistent, and it might be a bug.


Example:
CREATE TABLE t (i INT, j INT) WITH (autovacuum_enabled = false);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-
   public | t        | 21416 |               0 | ..
(1 row)

VACUUM FULL;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-
   public | t        | 21416 |               0 | ..
(1 row)

VACUUM;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-
   public | t        | 21416 |         4425 | ..
(1 row)


vacuum full operation doesn't call a vacuum operation, so we can't 
collect statistics for it. Furthermore, this is a different operation 
than vacuum because it completely rebuilds the table and indexes, so 
it looks like your previous table and its indexes were completely 
removed. To sum up, I think it isn't a bug that the statistics aren't 
showing here.


Ah, you're right. This table does contain the _statistics_ for it. 
Everything is okay then. Sorry for the confusion.


Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Vacuum statistics

2024-11-07 Thread Ilia Evdokimov



On 22.10.2024 22:30, Alena Rybakina wrote:


Hi!

On 16.10.2024 14:01, Alena Rybakina wrote:


Thank you for rebasing.

I have noticed that when I create a table or an index on this table, 
there is no information about the table or index in 
pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a 
VACUUM.


Example:

CREATE TABLE t (i INT, j INT);
INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't';

(0 rows)
CREATE INDEX ON t (i);
SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx';
...
(0 rows)

I can see the entries after running VACUUM or executing 
autovacuum. or when autovacuum is executed. I would suggest adding a 
line about the relation even if it has not yet been processed by 
vacuum. Interestingly, this issue does not occur with 
pg_stat_vacuum_database:


CREATE DATABASE example_db;
SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db';
dboid |   dbname | ...
 ...  | example_db | ...
(1 row)

BTW, I recommend renaming the view pg_stat_vacuum_database to 
pg_stat_vacuum_database_S_  for consistency with 
pg_stat_vacuum_tables and pg_stat_vacuum_indexes


Thanks for the review. I'm investigating this. I agree with the 
renaming, I will do it in the next version of the patch.


I fixed it. I added the left outer join to the vacuum views and for 
converting the coalesce function from NULL to null values.


I also fixed the code in getting database statistics - we can get it 
through the existing pgstat_fetch_stat_dbentry function and fixed 
couple of comments.


I attached a diff file, as well as new versions of patches.

--
Regards,
Alena Rybakina
Postgres Professional


Thank you for fixing it.

1) I have found some typos in the test output files (out-files) when 
running 'make check' and 'make check-world'. These typos might cause 
minor discrepancies in test results. You may already be aware of them, 
but I wanted to bring them to your attention in case they haven't been 
noticed. I believe these can be fixed quickly.


2) Additionally, I observed that when we create a table and insert some 
rows, executing the VACUUM FULL command does not update the information 
in the 'pg_stat_get_vacuum_tables' However, running the VACUUM command 
does update this information as expected. This seems inconsistent, and 
it might be a bug.


Example:
CREATE TABLE t (i INT, j INT) WITH (autovacuum_enabled = false);
INSERT INTO t SELECT i/10, i/100 FROM  GENERATE_SERIES(1,100) i;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-
   public | t        | 21416 |               0 | ..
(1 row)

VACUUM FULL;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-
   public | t        | 21416 |               0 | ..
(1 row)

VACUUM;
SELECT * FROM pg_stat_get_vacuum_tables WHERE relname = 't';
schema | relname |    relid | total_blks_read | .
---++-+--+-----
   public | t        | 21416 |         4425 | ..
(1 row)

Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2024-11-25 Thread Ilia Evdokimov



On 22.11.2024 09:08, Alexander Korotkov wrote:

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier  wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.  This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans).  The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck.  And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread.  But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold.  The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

--
Regards,
Alexander Korotkov
Supabase


I really liked your idea, and I’d like to propose an enhancement that I 
believe improves it further.


Yes, if a query’s execution time exceeds the threshold, it should always 
be tracked without sampling. However, for queries with execution times 
below the threshold, the sampling logic should prioritize shorter 
queries over those closer to the threshold. In my view, the ideal 
approach is for shorter queries to have the highest probability of being 
sampled, while queries closer to the threshold are less likely to be 
sampled.


This behavior can be achieved with the following logic:

pg_stat_statements.sample_exectime_threshold * random(0, 1) < total_time

Here’s how it works:

 * As a query’s execution time approaches zero, the probability of it
   being sampled approaches one.
 * Conversely, as a query’s execution time approaches the threshold,
   the probability of it being sampled approaches zero.

In other words, the sampling probability decreases linearly from 1 to 0 
as the execution time gets closer to the threshold.


I believe this approach offers an ideal user experience. I have attached 
a new patch implementing this logic. Please let me know if you have any 
feedback regarding the comments in the code, the naming of variables or 
documentation. I’m always open to discussion.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From edba1bcefa85e995ec3e9df6c9e8d30adcd940b9 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 26 Nov 2024 00:58:24 +0300
Subject: [PATCH] Add time-based sampling to pg_stat_statements

New configuration parameter pg_stat_statements.sample_exectime_threshold,
which allows tracking only queries that meet a specified execution time threshold.
Queries with execution times above the threshold are always tracked,
while shorter queries are sampled probabilistically.
This helps reduce the overhead of tracking frequent, fast queries
while preserving data for longer-running ones.
---
 .../pg_stat_statements/pg_stat_statements.c   | 57 ++-
 doc/src/sgml/pgstatstatements.sgml| 20 +++
 2 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..6ec1e3e7ed 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -289,6 +290,7 @@ static const struct config_enum_entry track_options[] =
 };
 
 static int	pgss_max = 5000;	/* max # statements to track */
+static int	pgss_sample_exectime_threshold = 0; /* Threshold for query execution sampling (msec) */
 static int	pgss_track = PGSS_TRACK_TOP;	/* tracking level */
 static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
@@ -414,6 +416,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomIntV

Re: Sample rate added to pg_stat_statements

2024-11-19 Thread Ilia Evdokimov


Hi everyone,

I believe we should also include this check in the pgss_ExecutorEnd() 
function because sampling in pgss_ExecutorEnd() ensures that a query not 
initially sampled in pgss_ExecutorStart() can still be logged if it 
meets the pg_stat_statements.sample_rate criteria. This approach adds 
flexibility by allowing critical queries to be captured while 
maintaining efficient sampling.


I attached new version of patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From c33e1ae07e8eb4f797b47e7deb07e6322b1375a3 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 19 Nov 2024 12:53:52 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to control just a fraction of the queries meeting the configured
threshold, to reduce the amount of controlling.
---
 .../pg_stat_statements/pg_stat_statements.c   | 25 ---
 doc/src/sgml/pgstatstatements.sgml| 17 +
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..42b3fee815 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -294,7 +295,10 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
@@ -414,6 +418,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to process.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -989,6 +1006,8 @@ pgss_planner(Query *parse,
 static void
 pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);
+
 	if (prev_ExecutorStart)
 		prev_ExecutorStart(queryDesc, eflags);
 	else
@@ -999,7 +1018,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+	if (current_query_sampled && pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -1068,8 +1087,8 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
 {
 	uint64		queryId = queryDesc->plannedstmt->queryId;
 
-	if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
-		pgss_enabled(nesting_level))
+	if (current_query_sampled && queryId != UINT64CONST(0) &&
+		queryDesc->totaltime && pgss_enabled(nesting_level))
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..1e2533a802 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,23 @@
 

 
+   
+
+ pg_stat_statements.sample_rate (real)
+ 
+  pg_stat_statements.sample_rate configuration parameter
+ 
+
+
+
+ 
+  pg_stat_statements.sample_rate causes pg_stat_statements to only
+  control a fraction of the statements in each session.  The default is 1,
+  meaning control all the queries. Only superusers can change this setting.
+ 
+
+   
+

 
  pg_stat_statements.save (boolean)
-- 
2.34.1

Sample rate added to pg_stat_statements

2024-11-18 Thread Ilia Evdokimov


Hi hackers,

Under high-load scenarios with a significant number of transactions per 
second, pg_stat_statements introduces substantial overhead due to the 
collection and storage of statistics. Currently, we are sometimes forced 
to disable pg_stat_statements or adjust the size of the statistics using 
pg_stat_statements.max, which is not always optimal. One potential 
solution to this issue could be query sampling in pg_stat_statements.


A similar approach has been implemented in extensions like auto_explain 
and pg_store_plans, and it has proven very useful in high-load systems. 
However, this approach has its trade-offs, as it sacrifices statistical 
accuracy for improved performance. This patch introduces a new 
configuration parameter, pg_stat_statements.sample_rate for the 
pg_stat_statements extension. The patch provides the ability to control 
the sampling of query statistics in pg_stat_statements.


This patch serves as a proof of concept (POC), and I would like to hear 
your thoughts on whether such an approach is viable and applicable.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.From b19081bd257014f0c4046570097e4bc7b28a3126 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 18 Nov 2024 21:05:43 +0300
Subject: [PATCH] [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to control just a fraction of the queries meeting the configured
threshold, to reduce the amount of controlling.
---
 .../pg_stat_statements/pg_stat_statements.c   | 22 ++-
 doc/src/sgml/pgstatstatements.sgml| 17 ++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..51d1a48252 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -73,6 +74,8 @@
 
 PG_MODULE_MAGIC;
 
+static double pgss_sample_rate = 1;
+
 /* Location of permanent stats file (valid when database is shut down) */
 #define PGSS_DUMP_FILE	PGSTAT_STAT_PERMANENT_DIRECTORY "/pg_stat_statements.stat"
 
@@ -295,6 +298,8 @@ static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to process.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -989,6 +1007,8 @@ pgss_planner(Query *parse,
 static void
 pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
+	current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);
+
 	if (prev_ExecutorStart)
 		prev_ExecutorStart(queryDesc, eflags);
 	else
@@ -999,7 +1019,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+	if (current_query_sampled && pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..1e2533a802 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,23 @@
 

 
+   
+
+ pg_stat_statements.sample_rate (real)
+ 
+  pg_stat_statements.sample_rate configuration parameter
+ 
+
+
+
+ 
+  pg_stat_statements.sample_rate causes pg_stat_statements to only
+  control a fraction of the statements in each session.  The default is 1,
+  meaning control all the queries. Only superusers can change this setting.
+ 
+
+   
+

 
  pg_stat_statements.save (boolean)
-- 
2.34.1

Re: Sample rate added to pg_stat_statements

2024-11-19 Thread Ilia Evdokimov


On 19.11.2024 15:11, Andrey M. Borodin wrote:




On 18 Nov 2024, at 23:33, Ilia Evdokimov  wrote:

Hi hackers,

Under high-load scenarios with a significant number of transactions per second, 
pg_stat_statements introduces substantial overhead due to the collection and 
storage of statistics. Currently, we are sometimes forced to disable 
pg_stat_statements or adjust the size of the statistics using 
pg_stat_statements.max, which is not always optimal. One potential solution to 
this issue could be query sampling in pg_stat_statements.

A similar approach has been implemented in extensions like auto_explain and 
pg_store_plans, and it has proven very useful in high-load systems. However, 
this approach has its trade-offs, as it sacrifices statistical accuracy for 
improved performance. This patch introduces a new configuration parameter, 
pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch 
provides the ability to control the sampling of query statistics in 
pg_stat_statements.

This patch serves as a proof of concept (POC), and I would like to hear your 
thoughts on whether such an approach is viable and applicable.

+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of 
them were using it wrong though. But at least it could give an easy way to rule 
out performance impact of pgss.

Thank you for review.

On 19 Nov 2024, at 15:09, Ilia Evdokimov  wrote:

I believe we should also include this check in the pgss_ExecutorEnd() function 
because sampling in pgss_ExecutorEnd() ensures that a query not initially 
sampled in pgss_ExecutorStart() can still be logged if it meets the 
pg_stat_statements.sample_rate criteria. This approach adds flexibility by 
allowing critical queries to be captured while maintaining efficient sampling.

Is there a reason why pgss_ProcessUtility is excluded?


Best regards, Andrey Borodin.



Ah, you’re right! Moreover, this check is needed not only in 
pgss_ProcessUtility() but in all places where pgss_enabled() is called. 
Therefore, it’s better to move the sampling check directly into 
pgss_enabled().


However, another issue arises with the initialization of 
'current_query_sample' variable that contains whether query is sampling 
or not. Initializing it in pgss_ExecutorStart()||is not sufficient, 
because pgss_post_parse_analyze() or pgss_ProcessUtility() might be 
called earlier. This could lead to different values of 
'current_query_sample' being used in these functions, which is 
undesirable. Run the regression tests for pg_stat_statements with 
initializing in pgss_ExecutorStart(), and you'll see this.


To avoid this, we need to find a function that is called earlier than 
all the others. In my opinion, pgss_post_parse_analyze() is a good 
candidate for this purpose. If you have objections or better 
suggestions, feel free to share them. I attached the patch with fixes.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 725d0a428040d46cfedac0ca707d7d7a27db8721 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 19 Nov 2024 16:58:28 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to control just a fraction of the queries meeting the configured
threshold, to reduce the amount of controlling.
---
 .../pg_stat_statements/pg_stat_statements.c   | 22 ++-
 doc/src/sgml/pgstatstatements.sgml| 17 ++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..01586c32b0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -294,12 +295,16 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to process.",
+

Re: Showing applied extended statistics in explain Part 2

2024-11-18 Thread Ilia Evdokimov


Hi everyone!

Thank you for your work.

1) While exploring extended statistics, I encountered a bug that occurs 
when using EXPLAIN (STATS) with queries containing OR conditions:


CREATE TABLE t (a int, b int, c int, d int);
INSERT INTO t SELECT x/10+1, x, x + 10, x * 2 FROM 
generate_series(1,1) g(x);

CREATE STATISTICS ON a, b FROM t;
CREATE STATISTICS ON c, d FROM t;
ANALYZE;

The following query works as expected:

EXPLAIN (STATS) SELECT * FROM t WHERE a > 0 AND b > 0 AND c > 0 AND d > 0;
   QUERY PLAN

 Seq Scan on t  (cost=0.00..255.00 rows=1 width=16)
   Filter: ((a > 0) AND (b > 0) AND (c > 0) AND (d > 0))
   Ext Stats: public.t_a_b_stat  Clauses: ((a > 0) AND (b > 0))
   Ext Stats: public.t_c_d_stat  Clauses: ((c > 0) AND (d > 0))
(4 rows)

However, when using OR conditions, the following query results in an error:

EXPLAIN (ANALYZE, STATS) SELECT * FROM t WHERE a > 0 AND b > 0 OR c > 0 
AND d > 0;

ERROR:  unrecognized node type: 314

2) It would be great if the STATS flag appeared as an option when 
pressing Tab during query input in the psql command-line interface.


Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2024-11-21 Thread Ilia Evdokimov



On 20.11.2024 01:07, Michael Paquier wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.  This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans).  The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

This would also get rid of the pgss text file for the queries, which
is a source of one of the bottlenecks, as we could just store query
strings upper-bounded based on a postmaster GUC to control the size of
the entries in the pgstats dshash.  More normalization for IN and ANY
clauses would also help a not here, these are a cause of a lot of
bloat.

This integration is not something I will be able to work on for the
PG18 dev cycle as I'm in full review/commit mode for the rest of this
release, but I got some plans for it in PG19 except if somebody beats
me to it.
--
Michael



I agree. Your proposal can indeed improve performance. Currently, I am 
working on these changes and will validate them with benchmarks. Once I 
have concrete results, I will open new threads to facilitate further 
discussion.


However, in my opinion, the suggested improvements are not enough, and 
sampling is essential.


1. I agree with Greg that pgss is widely used. It's quite odd that 
sampling exists in 'auto_explain' but not in pgss.


2. If performance issues arise even after these improvements and it 
turns out that pgss is the cause, the only painless solution without 
restarting the instance is sampling. The current pgss's parameters are 
not optimal for achieving this.


BTW, I forgot to include a case of nested statements. Either all will be 
tracked or none. I attached new version of patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 68f5451019b261bf03a222f5a05ac57cd0fb9a24 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 21 Nov 2024 11:24:03 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/pg_stat_statements.c   | 23 ++-
 doc/src/sgml/pgstatstatements.sgml| 18 +++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..d06e0d8a44 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -294,12 +295,16 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to process.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -835,6 +853,9 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
+	if (nesting_level

Typo in comment of auto_explain.c

2024-11-27 Thread Ilia Evdokimov


Hi everyone,

I found typo in comment auto_explain.c line 282 (/* Enable per-node 
instrumentation iff log_analyze is required. */).


Not 'iff' but 'if'.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Typo in comment of auto_explain.c

2024-11-27 Thread Ilia Evdokimov




On 27.11.2024 15:54, Heikki Linnakangas wrote:

On 27/11/2024 14:46, Ilia Evdokimov wrote:

Hi everyone,

I found typo in comment auto_explain.c line 282 (/* Enable per-node 
instrumentation iff log_analyze is required. */).


Not 'iff' but 'if'.


It's short for "if and only if". See 
https://en.wikipedia.org/wiki/If_and_only_if.


There was some previous discussion on pgsql-hackers on whether that 
usage common enough, although I don't find the thread right now. 
You're not the first one to think it's a typo :-).


Ah, now everything is clear. I need to look at history of pgsql-hackers 
more often. Then there are no questions.


Thanks.

Re: Vacuum statistics

2024-12-02 Thread Ilia Evdokimov

In my opinion, the patches are semantically correct. However, not all 
dead code has been removed - I'm referring to pgstat_update_snapshot(). 
Also, the tests need to be fixed.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2024-12-02 Thread Ilia Evdokimov



On 26.11.2024 01:15, Ilia Evdokimov wrote:



On 22.11.2024 09:08, Alexander Korotkov wrote:

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier  wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.  This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans).  The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck.  And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread.  But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold.  The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

--
Regards,
Alexander Korotkov
Supabase


I really liked your idea, and I’d like to propose an enhancement that 
I believe improves it further.


Yes, if a query’s execution time exceeds the threshold, it should 
always be tracked without sampling. However, for queries with 
execution times below the threshold, the sampling logic should 
prioritize shorter queries over those closer to the threshold. In my 
view, the ideal approach is for shorter queries to have the highest 
probability of being sampled, while queries closer to the threshold 
are less likely to be sampled.


This behavior can be achieved with the following logic:

pg_stat_statements.sample_exectime_threshold * random(0, 1) < total_time

Here’s how it works:

  * As a query’s execution time approaches zero, the probability of it
being sampled approaches one.
  * Conversely, as a query’s execution time approaches the threshold,
the probability of it being sampled approaches zero.

In other words, the sampling probability decreases linearly from 1 to 
0 as the execution time gets closer to the threshold.


I believe this approach offers an ideal user experience. I have 
attached a new patch implementing this logic. Please let me know if 
you have any feedback regarding the comments in the code, the naming 
of variables or documentation. I’m always open to discussion.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



I’ve been closely reviewing my last (v5-*.patch) patch on implementing 
time-based sampling, and I’ve realized that it might not be the best 
approach. Let me explain the reasons.


 * We can only perform sampling before the 'pgss_planner()' function.
   However, at that point, we don’t yet know the query's execution time
   since it only becomes available during 'pgss_ExecutorEnd()' or
   'pgss_ProcessUtility()';
 * If we wait to sample until the execution completes and we have the
   actual execution time, this introduces a problem. By that point, we
   might have already recorded the query's statistics into shared
   memory from the 'pgss_planner()' making it too late to decide
   whether to sample the query;
 * Delaying sampling until the execution finishes would require waiting
   for the execution time, which could introduce performance overhead.
   This defeats the purpose of sampling, which aims to reduce the cost
   of tracking query.

I believe we should reconsider the approach and revert to sampling based 
on v4-*.patch. If I’ve missed anything or there’s an alternative way to 
implement time threshold-based sampling efficiently, I’d be grateful to 
hear your insights.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Showing applied extended statistics in explain Part 2

2024-12-02 Thread Ilia Evdokimov




On 19.11.2024 00:38, Tomas Vondra wrote:

On 11/18/24 22:15, Tomas Vondra wrote:

...

So I think the correct solution is to not pass any expressions with
RestrictInfo to deparse_expression(). Either by stripping the nodes, or
by not adding them at all.

The patch tries to do the stripping by maybe_extract_actual_clauses(),
but that only looks at the top node, and that is not sufficient here.
Maybe it would be possible to walk the whole tree, and remove all the
RestrictInfos nodes - including intermediate ones, not just the top. But
I'm not quite sure it wouldn't cause issues elsewhere (assuming it
modifies the existing nodes). It still feels a bit like fixing a problem
we shouldn't really have ...


To make this idea a bit more concrete, here's a patch removing all
RestrictInfo nodes in show_stat_qual(). But still, it feels wrong.


regards



Yes, removing all 'RestrictInfos' during deparsing using 
'expression_tree_mutator()' is not optimal. However, I don't see an 
alternative. Perhaps it could be done this earlier in extended_stats.c 
to avoid the need for cleanup later ...


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

2024-12-09 Thread Ilia Evdokimov


Hi hackers,

The repeated use of the number 300 in the ANALYZE-related code creates 
redundancy and relies on scattered, sometimes unclear, comments to 
explain its purpose. This can make the code harder to understand, 
especially for new contributors who might not immediately understand its 
significance. To address this, I propose introducing a macro 
STATS_MIN_ROWS to represent this value and consolidating its explanation 
in a single place, making the code more consistent and readable.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d3cbea3875fdc9866c86c64c0d0e0d838887040c Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 9 Dec 2024 15:50:40 +0300
Subject: [PATCH v1] Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

This introduces a macro STATS_MIN_ROWS to represent
the default minimum number of rows (300) sampled in ANALYZE.
---
 src/backend/commands/analyze.c| 26 ++-
 src/backend/statistics/extended_stats.c   |  2 +-
 src/backend/tsearch/ts_typanalyze.c   |  4 +--
 src/backend/utils/adt/rangetypes_typanalyze.c |  7 +++--
 src/include/statistics/statistics.h   | 23 
 5 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9a56de2282..ff7b6ffe8f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1897,42 +1897,20 @@ std_typanalyze(VacAttrStats *stats)
 	{
 		/* Seems to be a scalar datatype */
 		stats->compute_stats = compute_scalar_stats;
-		/*
-		 * The following choice of minrows is based on the paper
-		 * "Random sampling for histogram construction: how much is enough?"
-		 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
-		 * Proceedings of ACM SIGMOD International Conference on Management
-		 * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
-		 * says that for table size n, histogram size k, maximum relative
-		 * error in bin size f, and error probability gamma, the minimum
-		 * random sample size is
-		 *		r = 4 * k * ln(2*n/gamma) / f^2
-		 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
-		 *		r = 305.82 * k
-		 * Note that because of the log function, the dependence on n is
-		 * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
-		 * bin size error with probability 0.99.  So there's no real need to
-		 * scale for n, which is a good thing because we don't necessarily
-		 * know it at this point.
-		 *
-		 */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else if (OidIsValid(eqopr))
 	{
 		/* We can still recognize distinct values */
 		stats->compute_stats = compute_distinct_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else
 	{
 		/* Can't do much but the trivial stuff */
 		stats->compute_stats = compute_trivial_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 
+	stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..451e2d1e9c 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -320,7 +320,7 @@ ComputeExtStatisticsRows(Relation onerel,
 	MemoryContextDelete(cxt);
 
 	/* compute sample size based on the statistics target */
-	return (300 * result);
+	return (STATS_MIN_ROWS * result);
 }
 
 /*
diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c
index ccafe42729..befd90c9d5 100644
--- a/src/backend/tsearch/ts_typanalyze.c
+++ b/src/backend/tsearch/ts_typanalyze.c
@@ -17,6 +17,7 @@
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
 #include "common/hashfn.h"
+#include "statistics/statistics.h"
 #include "tsearch/ts_type.h"
 #include "utils/builtins.h"
 #include "varatt.h"
@@ -64,8 +65,7 @@ ts_typanalyze(PG_FUNCTION_ARGS)
 		stats->attstattarget = default_statistics_target;
 
 	stats->compute_stats = compute_tsvector_stats;
-	/* see comment about the choice of minrows in commands/analyze.c */
-	stats->minrows = 300 * stats->attstattarget;
+	stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
 
 	PG_RETURN_BOOL(true);
 }
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 3773f98115..7fcac5621c 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -26,6 +26,7 @@
 
 #include "catalog/pg_operator.h"
 #include "commands/vacuum.h"
+#include "statistics/statistics.h"
 #include "utils/float.h"
 #include "util

Re: Sample rate added to pg_stat_statements

2025-01-07 Thread Ilia Evdokimov

I completely forgot about ordering pg_stat_statements in the test, which 
is why the test failed in [0]. I've added ORDER BY query COLLATE "C" to 
avoid any non-deterministic ordering in the table.


[0]: https://cirrus-ci.com/task/5724458477944832

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From be8a69ae36d0a20c945ff12bb385d634ce50a84e Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 7 Jan 2025 16:43:34 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 37 ++
 .../pg_stat_statements/pg_stat_statements.c   | 38 ++-
 contrib/pg_stat_statements/sql/select.sql | 11 ++
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..8485a7c451 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,43 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 --
 -- queries with locking clauses
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..4bab2a74ea 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -255,6 +256,7 @@ typedef struct pgssSharedState
 
 /* Current nesting depth of planner/ExecutorRun/ProcessUtility calls */
 static int	nesting_level = 0;
+static double	random_sample = 0.0;
 
 /* Saved hook values */
 static shmem_request_hook_type prev_shmem_request_hook = NULL;
@@ -294,12 +296,17 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
+
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +421,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to track.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -835,6 +855,11 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
+	random_sample = pg_prng_double(&pg_global_prng_state);
+
+	if (nesting_level == 0)
+		current_query_sampled = random_sample < pgss_sample_rate;
+
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(nesti

Re: Sample rate added to pg_stat_statements

2025-01-07 Thread Ilia Evdokimov




On 07.01.2025 22:29, Sami Imseih wrote:

You are right. This is absolutely unexpected for users. Thank you for
the review.

To fix this, I suggest storing a random number in the [0, 1) range in a
separate variable, which will be used for comparisons in any place. We
will compare 'sample_rate' and random value not only in
pgss_post_parse_analyze(), but also in pgss_ProcessUtility() and
pgss_planner().

I attached patch with test and fixes.

I still think this is not the correct approach. It seems that post_parse_analyze
should not have to deal with checking for a sample rate. This is because
post_parse_analyze, which is the only hook with access to jstate, is
only responsible
for storing a normalized query text on disk and creating a not-yet
user visible entry
in the hash. i.e. pgss_store will never increment counters when called from
pgss_post_parse_analyze.

/* Increment the counts, except when jstate is not NULL */
if (!jstate)
{

What I think may be a better idea is to add something similar
to auto_explain.c, but it should only be added to the top of
pgss_planner, pgss_ExecutorStart and pgss_ProcessUtility.

if (nesting_level == 0)
{
if (!IsParallelWorker())
current_query_sampled = pg_prng_double(&pg_global_prng_state) <
pgss_sample_rate;
else
current_query_sampled = false;
}

This is needed for ExecutorStart and not ExecutorEnd because
ExecutorStart is where the instrumentation is initialized with
queryDesc->totaltime. The above code block could be
turned into a macro and reused in the routines mentioned.

However, it seems with this change, we can see a much
higher likelihood of non-normalized query texts being stored.
This is because jstate is only available during post_parse_analyze.
Therefore, if the first time you are sampling the statement is in ExecutorEnd,
then you will end up storing a non-normalized version of the query text,
see below example with the attached v8.

postgres=# set pg_stat_statements.sample_rate = 0;
SET
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
---
  2025-01-07 13:02:11.807952-06
(1 row)

postgres=# SELECT 1 \parse stmt

postgres=# \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
  query | calls
---+---
(0 rows)

postgres=# set pg_stat_statements.sample_rate = 1;
SET
postgres=# \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
  query | calls
---+---
(0 rows)

postgres=#  \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
 query| calls
-+---
  SELECT 1| 1
  SELECT query, calls FROM pg_stat_statements | 1
(2 rows)

postgres=#  \bind_named stmt \g
  ?column?
--
 1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
 query| calls
-+---
  SELECT 1| 2
  SELECT query, calls FROM pg_stat_statements | 2
(2 rows)

One idea is to make jstate available to all hooks, and completely
remove reliance on post_parse_analyze in pg_stat_statements.
I can't find the thread now, but I know this has come up in past discussions
when troubleshooting gaps in query normalization. My concern is this
feature will greatly increase the likelihood of non-normalized query texts.

Also, with regards to the benchmarks, it seems
sampling will be beneficial for workloads that are touching a small number
of entries with high concurrency. This is why we see benefit for
a standard pgbench workload.
Samping becomes less beneficial when there is a large set of queries
being updated.
I still think this is a good approach for workloads that touch a small set
of entries.

Regards,

Sami




Wow, thank you for pointing this out. Your solution looks interesting, 
but I'd like to explore other ways to solve the issue besides making it 
available to all hooks. If I don't find anything better, I'll go with yours.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-01-07 Thread Ilia Evdokimov



On 06.01.2025 18:57, Andrey M. Borodin wrote:

On 6 Jan 2025, at 15:50, Ilia Evdokimov  wrote:

Any suggestions for improvements?

The patch looks good to me, just a few nits.

0. Perhaps, we could have a test for edge values of 0 and 1. I do not insist, 
just an idea to think about.



I would add tests, because they are never useless. I've added a simple 
test which verifies hash table with queries after setting sample_rate = 
0.0 and sample_rate = 1.0.





1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure 
it's a good idea, but still. Also, it uses "<=", not "<".

 xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);

Thanks!


Best regards, Andrey Borodin.



Are we sure we're discussing the same patch? Because these remarks refer 
to the 5 version of the patch, which I abandoned due to your remarks.



On 06.01.2025 20:51, Sami Imseih wrote:

Hi,

I was looking at this patch, and I was specifically curious about
how this works with prepared statements. The way the patch
works now is that it determines if the statement is to be sampled
at post_parse_analyze time which could lead to unexpected
behavior.

Let's take an example below in which the
pg_stat_statements.sample_rate is set to 0 ( to mimic
some sampling rate < 1 in which this query does not
get sampled ). At that point, all subsequent executions
of the statement will not get tracked at all. Is this
what is expected for prepared statements? My concern
is we will even lose more stats than what a user
may expect.

This of course will not be an issue for simple query.

postgres=# set pg_stat_statements.sample_rate = 0;
SET
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
---
  2025-01-06 11:45:23.484793-06
(1 row)

postgres=# SELECT $1 \parse stmt

postgres=#
postgres=# \bind_named stmt 1 \g
  ?column?
--
  1
(1 row)

postgres=# \bind_named stmt 1 \g
  ?column?
--
  1
(1 row)

postgres=# set pg_stat_statements.sample_rate = 1;
SET
postgres=# \bind_named stmt 1 \g
  ?column?
--
  1
(1 row)

postgres=# \bind_named stmt 1 \g
  ?column?
--
  1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
  query | calls
---+---
(0 rows)

Regards,

Sami Imseih
Amazon Web Services (AWS)




You are right. This is absolutely unexpected for users. Thank you for 
the review.


To fix this, I suggest storing a random number in the [0, 1) range in a 
separate variable, which will be used for comparisons in any place. We 
will compare 'sample_rate' and random value not only in 
pgss_post_parse_analyze(), but also in pgss_ProcessUtility() and 
pgss_planner().


I attached patch with test and fixes.

If you have any objections or suggestions on how to improve it, I'm 
always open to feedback.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 388c12632610cc6abddd6fc750a1b15712d62e63 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 7 Jan 2025 14:02:32 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 37 ++
 .../pg_stat_statements/pg_stat_statements.c   | 38 ++-
 contrib/pg_stat_statements/sql/select.sql | 11 ++
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..724ec9c8ac 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,43 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements;
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements;
+   query| calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements| 0
+(3 rows)
+
+SELECT pg_stat_statement

Re: Sample rate added to pg_stat_statements

2025-01-06 Thread Ilia Evdokimov



On 10.12.2024 17:35, Ilia Evdokimov wrote:

Hi everyone,

I attached previous sampling patch for pg_stat_statements (v4). 
Suggestions like sampling based on execution time remain unfeasible, 
as pg_stat_statements can track query during query planning, not 
execution.


To evaluate the implementation, I ran a benchmark creating 1,000 
random tables and executing randomized JOIN queries  on a small 
machine. When pg_stat_statements enabled, performance decreases, but 
reducing the sampling rate helps mitigate the impact and improves 
performance.


I’d be interested in hearing your new thoughts. Are there areas where 
this patch could be improved, or other ideas worth exploring?


--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC.



I've fixed some small typos in the documentation and in the GUC 
description in the attached patch. Any suggestions for improvements?


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From e0955b12e6a71efd6fc5d406e3d93bda4ec57bbf Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 6 Jan 2025 13:38:22 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/pg_stat_statements.c   | 23 ++-
 doc/src/sgml/pgstatstatements.sgml| 18 +++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..a8e91f9469 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -294,12 +295,16 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to track.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -835,6 +853,9 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
+	if (nesting_level == 0)
+		current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);
+
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
 		return;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..37f6788798 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,24 @@
 

 
+   
+
+ pg_stat_statements.sample_rate (real)
+ 
+  pg_stat_statements.sample_rate configuration parameter
+ 
+
+
+
+ 
+  pg_stat_statements.sample_rate causes pg_stat_statements to only
+  track a fraction of the statements in each session. The default is 1,
+  meaning track all the queries. In case of nested statements,
+  either all will be tracked or none. Only superusers can change this setting.
+ 
+
+   
+

 
  pg_stat_statements.save (boolean)
-- 
2.34.1

Re: Remove unused rel parameter in lookup_var_attr_stats

2025-01-06 Thread Ilia Evdokimov



On 03.01.2025 18:42, Fabrízio de Royes Mello wrote:


On Fri, Jan 3, 2025 at 11:09 AM Ilia Evdokimov 
 wrote:


Hi hackers,

I've attached a small patch that remove unused parameter 'rel' from
the static function lookup_var_attr_stats() in extended_stats.c

While exploring src/backend/statistics, I noticed that there are
several
other parameters which appear to be unused. However, since the
discussion around import/export statistics [1] attribute_stats.c is
still ongoing and many functions from other files are non-static,
those
parameters may yet prove useful. Therefore, I'm only removing the
parameter from this static function.


LGTM looks like a leftover from a4d75c86.

Regards,

--
Fabrízio de Royes Mello



True. Moreover, I checked the commit and found nothing else.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Removing unused parameter in compute_expr_stats

2024-12-26 Thread Ilia Evdokimov


Hi hackers,

I'm sharing a small patch that removes an unused parameter (totalrows) 
from compute_expr_stats function in extended_stats.c .


This function originally appeared in commit 
a4d75c86bf15220df22de0a92c819ecef9db3849, which introduced extended 
statistics on expressions. From what I can see, total rows is never used 
in the current implementation, so this patch removes it


If there are reasons to keep this parameter, or if there's anything I 
might have missed, I'd be happy to discuss further.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

From 0de17516bec761474761bf2862724b23ccb11beb Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 26 Dec 2024 12:09:14 +0300
Subject: [PATCH v1] Remove unused totalrows parameter in compute_expr_stats

The totalrows parameter in compute_expr_stats is never referenced anyway
in the function or by its callers. It appears to have been introduced in
commit a4d75c86bf15220df22de0a92c819ecef9db3849, but it remained unused
---
 src/backend/statistics/extended_stats.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..85da3edb10 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -89,9 +89,8 @@ typedef struct AnlExprData
 	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
 } AnlExprData;
 
-static void compute_expr_stats(Relation onerel, double totalrows,
-			   AnlExprData *exprdata, int nexprs,
-			   HeapTuple *rows, int numrows);
+static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
+			   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -220,8 +219,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 exprdata = build_expr_data(stat->exprs, stattarget);
 nexprs = list_length(stat->exprs);
 
-compute_expr_stats(onerel, totalrows,
-   exprdata, nexprs,
+compute_expr_stats(onerel, exprdata, nexprs,
    rows, numrows);
 
 exprstats = serialize_expr_stats(exprdata, nexprs);
@@ -2107,9 +2105,8 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
  * Compute statistics about expressions of a relation.
  */
 static void
-compute_expr_stats(Relation onerel, double totalrows,
-   AnlExprData *exprdata, int nexprs,
-   HeapTuple *rows, int numrows)
+compute_expr_stats(Relation onerel, AnlExprData *exprdata,
+   int nexprs, HeapTuple *rows, int numrows)
 {
 	MemoryContext expr_context,
 old_context;
-- 
2.34.1

Re: Exists pull-up application with JoinExpr

2024-12-27 Thread Ilia Evdokimov


Hi Alena,

Thank you for your work on subqueries with JOIN.

Have you considered the scenario where in subquery includes a qual like 
(tc.aid = 1)? When I tried executing those queries I receive different 
results. In my opinion, to prevent this, we should add filters for such 
quals within the loop 'foreach (lc, all_clauses)'


EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
  QUERY PLAN
--
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (ta.id = tb.id)
   Buffers: local hit=3
   ->  Seq Scan on ta (actual rows=3 loops=1)
 Buffers: local hit=1
   ->  Hash (actual rows=3 loops=1)
 Buckets: 4096  Batches: 1  Memory Usage: 33kB
 Buffers: local hit=2
 ->  HashAggregate (actual rows=3 loops=1)
   Group Key: tb.id
   Batches: 1  Memory Usage: 121kB
   Buffers: local hit=2
   ->  Nested Loop (actual rows=3 loops=1)
 Buffers: local hit=2
 ->  Seq Scan on tb (actual rows=3 loops=1)
   Buffers: local hit=1
 ->  Materialize (actual rows=1 loops=3)
   Storage: Memory  Maximum Storage: 17kB
   Buffers: local hit=1
   ->  Seq Scan on tc (actual rows=1 loops=1)
 Filter: (aid = 1)
 Rows Removed by Filter: 1
 Buffers: local hit=1
(23 rows)



EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
    QUERY PLAN
---
 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=6
   SubPlan 1
 ->  Nested Loop (actual rows=0 loops=3)
   Buffers: local hit=5
   ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
 Index Cond: (id = ta.id)
 Heap Fetches: 1
 Buffers: local hit=4
   ->  Seq Scan on tc (actual rows=1 loops=1)
 Filter: (aid = 1)
 Buffers: local hit=1
(14 rows)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-02-05 Thread Ilia Evdokimov




On 04.02.2025 20:59, Sami Imseih wrote:

To summarize the results of all benchmarks, I compiled them into a table:

Thanks for compiling the benchmark data above.

The main benefit of this patch will be to give the user
a toggle if they are observing high spinlock contention
due to pg_stat_statements which will likely occur
on larger machines.

I also did not see this thread [1] mentioned in the thread above,
but one of the reasons pg_stat_statements.track_planning
was turned off by default was due to the additional spinlock
acquire to track the planning stats. Bringing this up as sample_rate
might also be beneficial as well if a user decides to track planning.

Regards,

Sami

[1] 
https://www.postgresql.org/message-id/flat/2895b53b033c47ccb22972b589050...@ex13d05uwc001.ant.amazon.com



Thanks for the thread. As we can see, simply enabling or disabling not 
only track_planning but also other pg-stat_statements parameters is too 
radical a measure for managing performance. With the introduction of 
sample_rate,  users now have more flexibility in controlling spinlock 
contention. This allows them to balance overhead and statistics 
collection rather than completely disabling the feature.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov

But what should we do if cheapest == NULL further? Should we return NULL 
of get_cheapest_parameterized_child_path() function?


If it is, we should write it like this:

if (cheapset == NULL || bms(PATH_REQ_OUTER(cheapset), required_outer))
    return cheapest;

I'll look into this issue further.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-02-07 Thread Ilia Evdokimov



On 08.02.2025 00:01, Matheus Alcantara wrote:

Em sex., 7 de fev. de 2025 às 17:41, Ilia Evdokimov
 escreveu:

Strange, I don't have any problems to apply it on master.

postgres$ git branch
  * master
postgres$ git pull
  Already up to date.
postgres$ git apply
v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch
postgres$

Just for reference I'm trying to apply based on commit fb056564ec5.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fb056564ec5bc1c18dd670c963c893cdb6de927e



You are right, because two commits were appeared after creating v6-patch 
on partition_prune.out and patch v6 must not have applied on master. 
Then I created v7 patch rebased on fb056564ec5 . Thank for your remark!


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d94d30618b6f363a54afe5457ef8fe89476e096e Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sat, 8 Feb 2025 00:15:54 +0300
Subject: [PATCH] Clarify display of rows and loops as decimal fractions

When the average number of rows is small compared to the number of loops,
both rows and loops are displayed as decimal fractions with two digits
after the decimal point in EXPLAIN ANALYZE.
---
 doc/src/sgml/perform.sgml |  6 ++-
 src/backend/commands/explain.c| 49 +--
 src/test/regress/expected/partition_prune.out | 10 ++--
 src/test/regress/sql/partition_prune.sql  |  2 +-
 4 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..3f13d17fe9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -757,7 +757,11 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
 comparable with the way that the cost estimates are shown.  Multiply by
 the loops value to get the total time actually spent in
 the node.  In the above example, we spent a total of 0.030 milliseconds
-executing the index scans on tenk2.
+executing the index scans on tenk2.   If a subplan node
+is executed multiple times and the average number of rows is less than one,
+the rows and loops values are shown as a decimal fraction
+(with two digits after the decimal point) to indicate that some rows
+were actually processed rather than simply rounding down to zero.

 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..200294b756 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1981,14 +1981,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
+			appendStringInfo(es->str, " (actual");
 			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 		}
 		else
 		{
@@ -1999,8 +1999,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
 	 3, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 2, es);
+			}
+			else
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
 		}
 	}
 	else if (es->analyze)
@@ -2052,14 +2060,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
 ExplainIndentText(es);
+appendStringInfo(es->str, "actual");
 if (es->timing)
-	appendStringInfo(es->str,
-	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-	 startup_ms, total_ms, rows, nloops);
+	appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+if (nloops > 1 && planstate->instrument->ntuples < nloops)
+	appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 else
-	appendStringInfo(es->str,
-	 "actual rows=%.0f loops=%.0f\n",
-	 rows, nloops);
+	appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloop

Re: Sample rate added to pg_stat_statements

2025-02-10 Thread Ilia Evdokimov


Hi hackers,

Since current patch is in the commitfest with the status 'Ready for 
committer', I’d like to summarize what it does, the problems it 
addresses, and answer the key questions raised in the discussion thread.


Enabling pg_stat_statements can cause a performance drop due to two main 
reasons:


 * Contention on the exclusive lock, which blocks the hash table when
   allocating or deallocating an entry.
 * Contention on the spinlock, which blocks access to an existing entry.

The exclusive lock issue can be mitigated by normalizing similar queries 
to the same queryid. There is an ongoing thread discussing this approach 
for IN and ARRAY queries: [0]. Another solution is integrating 
pg_stat_statements into central pgstats with the custom APIs, without 
pushing the module into core [1].


However, even if we do all that, the spinlock contention problem will 
persist for very frequent queries. This is exactly what I aim to solve 
with sampling. Other approaches, such as replacing the spinlock with 
atomic variables or an lwlock, have not yielded good results [2]. Even 
track_planning was disabled by default to reduce contention, but on 
high-core systems, the issue still remains.So far, I see no alternative 
solution to this problem.


The benchmark results on different number of CPUs demonstrating the 
impact of this patch can be found here:


pgbench -c{CPUs} -j20 -S -Mprepared -T120
  CPUs | sample_rate |   tps | CPU waits | ClientRead wait | SpinDelay wait
   192 |         1.0 | 484338|  9568 | 929 | 11107
   192 |        0.75 | 909547| 12079 |    2100 | 4781
   192 |         0.5 |1028594| 13253 |    3378 | 174
   192 |        0.25 |1019507| 13397 |    3423 | -
   192 |         0.0 |1015425| 13106 |    3502 | -

    48 |         1.0 | 643251|   911 | 932 | 44
    48 |        0.75 | 653946|   939 | 939 |  3
    48 |         0.5 | 651654|   841 | 932 | -
    48 |        0.25 | 652668|   860 | 910 | -
    48 |         0.0 | 659111|   849 | 882 | -

    32 |         1.0 | 620667|  1782 | 560 | -
    32 |        0.75 | 620663|  1736 | 554 | -
    32 |         0.5 | 624094|  1741 | 648 | -
    32 |        0.25 | 628638|  1702 | 576 | -
    32 |         0.0 | 630483|  1638 | 574 | -

Some suggested sampling based on execution time instead of a random 
generator. While this is a good idea, it has a critical limitation. 
Execution time can only be checked in pgss_ExecutorEnd. However, 
pgss_planner already interacts with the spinlock, and we can not sample 
based on execution time at that stage. Finding a criterion that works 
for both pgss_planner and pgss_ExecutorEnd is difficult. This is why 
random sampling remains the most viable option.


BTW, Instead of using a bool flag to check if we are inside 
pgss_post_parse_analyze, we now pass a pgssStoreKind to pgss_enabled. 
This makes the logic clearer and more readable, explicitly indicating 
where we need to check whether a query should be sampled. I made it in 
new v15 patch.


Are there any remaining concerns or alternative suggestions regarding 
this approach? I'm happy to discuss further.


[0]: https://commitfest.postgresql.org/51/2837/
[1]: https://www.postgresql.org/message-id/Zz0MFPq1s4WFxxpL%40paquier.xyz
[2]: https://commitfest.postgresql.org/29/2634/

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From c9540fac9f6d6ad6d15acdaad844ca6c4ecba455 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 10 Feb 2025 11:00:46 +0300
Subject: [PATCH] Allow setting sample rate for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/sampling.out  | 174 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  53 +-
 contrib/pg_stat_statements/sql/sampling.sql   |  50 +
 doc/src/sgml/pgstatstatements.sgml|  19 ++
 6 files changed, 291 insertions(+), 8 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/sampling.out
 create mode 100644 contrib/pg_stat_statements/sql/sampling.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..b70bdfaf2d 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup

Re: explain analyze rows=%.0f

2025-02-07 Thread Ilia Evdokimov




On 07.02.2025 22:59, Matheus Alcantara wrote:

Hi
v6 is not applying on master, could you please rebase?

git apply v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch
error: patch failed: src/test/regress/expected/partition_prune.out:3041
error: src/test/regress/expected/partition_prune.out: patch does not apply


Hi

Strange, I don't have any problems to apply it on master.

postgres$ git branch
    * master
postgres$ git pull
    Already up to date.
postgres$ git apply 
v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch

postgres$

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-02-10 Thread Ilia Evdokimov

On 10.02.2025 18:32, Matheus Alcantara wrote:

Thanks for the new patch version.

-- v7-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch

+if (nloops > 1 && planstate->instrument->ntuples < nloops)
+appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows,
nloops);

Sorry, but why do the 'loops' need to be changed? IIUC the issue is
only with the
rows field? When testing this patch I got some results similar with this:
Index Scan using t_a_idx on t (cost=0.14..0.28 rows=1 width=8)
(actual time=0.049..0.049 rows=0.50 loops=2.00)

The only reason I added this was to make it clear to the user that loops
> 1, along with the 'rows' value. If no one finds it useful and it only
raises more questions, I can remove it.

When the total number of returned tuples is less than the number of
loops currently shows 'rows = 0'. This can mislead users into thinking
that no rows were returned at all, even though some might have appeared
occasionally.

I think that this can happen when the returned rows and the loops are small
enough to result in a 'row' value like 0.00045? I'm not sure if we have
"bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17
rows) this would also be true, what do you think? If you could provide
an example of this case would be great!

Based on what was discussed earlier in the thread, there are cases with
large loops [0]. However, I believe it's better not to display average
rows with excessively long digits or in scientific notation. And, of
course, I agree with you regarding small values. I think we should also
add a check to ensure that the total rows is actually greater than zero.
When the total rows is zero, we could simply display it as an integer
without decimals. It could help users average rows is very small but not
zero. What do you think about this approach?

-executing the index scans on tenk2.
+executing the index scans on tenk2. If a subplan node
+is executed multiple times and the average number of rows is less than one,
+the rows and loops values are shown as a
decimal fraction
+(with two digits after the decimal point) to indicate that some rows
+were actually processed rather than simply rounding down to zero.

* I think that it would be good to mention what a 'row' value in
decimal means. For
example, if its says "0.1 rows" the user should assume that typically 0 rows
will be returned but sometimes it can return 1 or more.

* There are more spaces than necessary before "If a subplan node ..."

* Maybe wrap 'rows' with ?

I agree with the last two points. As for the first one—maybe we could
simply state that the average rows value can be decimal, especially for
very small values?

[0]:
https://www.postgresql.org/message-id/a9393107-6bb9-c835-50b7-c0f453a514b8%40postgrespro.ru

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov



On 05.02.2025 21:56, Tom Lane wrote:

It's not a bug.  Since the call specifies NIL pathkeys (meaning it
doesn't care about sort order) and does not insist on a parallel-safe
path, there should always be a path that satisfies it.  The only
way it could fail to find a path is if the rel's pathlist is entirely
empty, a case already rejected by the sole caller.

Moreover, the argument that we might not have gotten a report is not
credible.  In a production build without an Assert, the next line
would still dump core just as effectively if the result were NULL.

While the proposed patch doesn't break anything, it's removing a
logic cross-check that was put there intentionally.  So I don't
find it to be an improvement.

regards, tom lane



Ah, if this Assert was intentionally added to ensure that a path must be 
always found under the given conditions, and that any issues with this 
can be detected in the right place, then I agree. The patch likely makes 
worse.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Showing applied extended statistics in explain Part 2

2025-02-05 Thread Ilia Evdokimov




On 05.02.2025 09:28, Tatsuro Yamada wrote:

Hi All,

Thank you everyone for your cooperation with comments on the patch and 
solution ideas.
I am sorting through your review comments now. And after rebasing the 
patch, I plan to
send a patch that addresses the comments as much as possible to 
-hackers by Feb 21 at the latest.


Therefore, the status of the patch for this feature from last month's 
commit fest will be

"moving to the next commit fest".

P.S.
Thanks for letting me know, Ilia.

Thanks,
Tatsuro Yamada



Thank for your work!

Regarding dependency statistics, I noticed that we can't apply them 
individually, which makes it difficult to map specific clauses to 
individual stats. However, why not display the dependency statistics 
above the execution plan as a list? We could format it as a list, but I 
believe it's crucial to highlight when dependency statistics are 
applied, Analyzing dependencies stats often takes the longest compared 
to other statistics, and if users could see the extended statistics for 
dependencies upfront, they might be able to remove them if they're not 
useful for the plan.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Avoid possible deference NULL pointer (src/backend/optimizer/path/allpaths.c)

2025-02-05 Thread Ilia Evdokimov




On 05.01.2025 02:29, Ranier Vilela wrote:

Hi.

Per Coverity.

All call sites of function *get_cheapest_path_for_pathkeys* checks
for NULL returns.

So, it is highly likely that the function will return NULL.

IMO, the Assert in this particular call, is not fully effective.

Fix removing the Assert and always check if the return is NULL.

best regards,
Ranier Vilela



Hi!

Thanks for noticing this. If this happens in the planner, it poses a 
serious risk of a segmentation fault that could crash the instance if a 
NULL pointer is dereferenced. Since checking for NULL is very cheap, I 
support this patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Tests for extended MCV stats compatibility with multi-filter conditions on a single column

2025-02-06 Thread Ilia Evdokimov


Hi hackers,

When a query uses multiple filters on a single column in the WHERE 
clause, and extended mcv statistics are applied on the column, the 
optimizer should take each clause into account when estimating 
selectivity. For example, this could be (A = 0 OR A = 1), (A BETWEEN 0 
AND 5), or (A > 0 AND A <= 5). Such range conditions are quite common in 
practice. However, I did not find any tests covering these scenarios in 
the stats_ext.sql regression test.


I propose adding tests for these cases. This will make the behavior of 
MCV extended statistics more clear to users and developers.


Any feedback, suggestions, or additions to the tests are welcome and 
appreciated.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From e2b2e591e0df5a7e2ed589182b204781229d5754 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 7 Feb 2025 00:29:53 +0300
Subject: [PATCH v1] Add tests for extended MCV statistics with OR, BETWEEN

Since extended MCV statistics statistics are compatible with conditions
like OR (e.g., A = 0 OR A = 1);  AND, BETWEEN (e.g., A >= 0 AND A < 3),
this patch adds tests to verify their correct handling.
---
 src/test/regress/expected/stats_ext.out | 24 
 src/test/regress/sql/stats_ext.sql  |  8 
 2 files changed, 32 insertions(+)

diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index a4c7be487e..f26d07261e 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -2803,6 +2803,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
  6 |102
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 OR a = 10) AND (b = 0 OR b = 10) AND (c = 0 OR c = 10)');
+ estimated | actual 
+---+
+ 1 |104
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a BETWEEN 7 AND 10) AND (b BETWEEN 7 AND 10) AND (c BETWEEN 7 AND 10)');
+ estimated | actual 
+---+
+ 1 |308
+(1 row)
+
 CREATE STATISTICS mcv_lists_partial_stats (mcv) ON a, b, c
   FROM mcv_lists_partial;
 ANALYZE mcv_lists_partial;
@@ -2854,6 +2866,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
108 |102
 (1 row)
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 OR a = 10) AND (b = 0 OR b = 10) AND (c = 0 OR c = 10)');
+ estimated | actual 
+---+
+   104 |104
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a BETWEEN 7 AND 10) AND (b BETWEEN 7 AND 10) AND (c BETWEEN 7 AND 10)');
+ estimated | actual 
+---+
+   309 |308
+(1 row)
+
 DROP TABLE mcv_lists_partial;
 -- check the ability to use multiple MCV lists
 CREATE TABLE mcv_lists_multi (
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 5c786b16c6..b6fe084675 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -1410,6 +1410,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 OR a = 10) AND (b = 0 OR b = 10) AND (c = 0 OR c = 10)');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a BETWEEN 7 AND 10) AND (b BETWEEN 7 AND 10) AND (c BETWEEN 7 AND 10)');
+
 CREATE STATISTICS mcv_lists_partial_stats (mcv) ON a, b, c
   FROM mcv_lists_partial;
 
@@ -1431,6 +1435,10 @@ SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 AND b = 0) OR (a = 0 AND c = 0) OR (b = 0 AND c = 0)');
 
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a = 0 OR a = 10) AND (b = 0 OR b = 10) AND (c = 0 OR c = 10)');
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_partial WHERE (a BETWEEN 7 AND 10) AND (b BETWEEN 7 AND 10) AND (c BETWEEN 7 AND 10)');
+
 DROP TABLE mcv_lists_partial;
 
 -- check the ability to use multiple MCV lists
-- 
2.34.1

Re: explain analyze rows=%.0f

2025-02-10 Thread Ilia Evdokimov



On 10.02.2025 23:43, Matheus Alcantara wrote:

When the total number of returned tuples is less than the number of
loops currently shows 'rows = 0'. This can mislead users into thinking
that no rows were returned at all, even though some might have appeared
occasionally.


I think that this can happen when the returned rows and the loops are small
enough to result in a 'row' value like 0.00045? I'm not sure if we have
"bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17
rows) this would also be true, what do you think? If you could provide
an example of this case would be great!


Based on what was discussed earlier in the thread, there are cases with
large loops [0]. However, I believe it's better not to display average
rows with excessively long digits or in scientific notation. And, of
course, I agree with you regarding small values. I think we should also
add a check to ensure that the total rows is actually greater than zero.
When the total rows is zero, we could simply display it as an integer
without decimals. It could help users average rows is very small but not
zero. What do you think about this approach?


Yeah, I agree with you about the long digits. My question is more about why do
we need the planstate->instrument->ntuples < nloops check? I tried to remove
this check and I got a lot of EXPLAIN output that shows 'rows' values with .00,
so I'm just trying to understand the reason. From what I've understood about
this thread is that just avoiding .00 decimals of 'rows' values that could be
just integers would be enough, is that right or I'm missing something here? I'm
just worried if we could have a scenario where nloops > 1 &&
planstate->instrument->ntuples < nloops which would make the 'rows' not be
formatted correctly.



Sorry for missing your question earlier. If you notice in the code 
above, the variable(average) 'rows' is defined as:


double rows = planstate->instrument->ntuples / nloops;

This represents the total rows divided by the number of loops. The 
condition means that variable 'rows' will always  between zero and one. 
Therefore, the average rows under such conditions cannot be greater than 
or even equal to one. I wrote this condition specifically to avoid the 
verbose expression 'rows > 0 && rows < 1'. However, since this might not 
be obvious to everyone, perhaps it'd be better to write is using 'rows' 
directly or add a comment explaining this logic.




-executing the index scans on tenk2.
+executing the index scans on tenk2.   If a subplan node
+is executed multiple times and the average number of rows is less than one,
+the rows and loops values are shown as a
decimal fraction
+(with two digits after the decimal point) to indicate that some rows
+were actually processed rather than simply rounding down to zero.

* I think that it would be good to mention what a 'row' value in
decimal means. For
   example, if its says "0.1 rows" the user should assume that typically 0 rows
   will be returned but sometimes it can return 1 or more.

* There are more spaces than necessary before "If a subplan node ..."

* Maybe wrap 'rows' with  ?


I agree with the last two points. As for the first one—maybe we could
simply state that the average rows value can be decimal, especially for
very small values?


I'm just not sure about the "small values"; the 'rows' in decimal will only
happen with small values? What would be a "small value" in this context? My main
point here is more that I think that it would be good to mention *why* the
'rows' can be decimal, not just describe that it could be decimal.



As for 'small values', it means that the average rows is between zero 
and one, to avoid rounding errors and misunderstanding. I think this 
would be ideal.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: pull-up subquery if JOIN-ON contains refs to upper-query

2025-02-10 Thread Ilia Evdokimov



On 09.02.2025 18:14, Alena Rybakina wrote:
Hi! I found another example where the transformation worked 
incorrectly and reconsidered the idea.


As for conversion of exists_sublink_to_ANY, we need to get the 
flattened implicit-AND list of clauses and pull out the chunks of the 
WHERE clause that belong to the parent query,
since we are called halfway through the parent's 
preprocess_expression() and earlier steps of preprocess_expression() 
wouldn't get applied to the pulled-up stuff unless we do them here.
We also do some processing for vars depending on which side the var is 
on - if it's in a subquery, we only need to lower its level (varlevel) 
because subquery will be flatted, while
for other vars that belong to the parent query, we need to do 
preparation to pull up the sub-select into top range table.


For those expressions that we couldn't assign to either list, we 
define newWhere and apply both cases.




When I run 'make -C contrib/ check', tests of postgres_fdw extension 
failed. I might be wrong, but you should be careful with LIMIT.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
diff -U3 /home/ilia/postgres/contrib/postgres_fdw/expected/postgres_fdw.out 
/home/ilia/postgres/contrib/postgres_fdw/results/postgres_fdw.out
--- /home/ilia/postgres/contrib/postgres_fdw/expected/postgres_fdw.out  
2025-02-10 23:31:19.353059650 +0300
+++ /home/ilia/postgres/contrib/postgres_fdw/results/postgres_fdw.out   
2025-02-10 23:46:43.249745683 +0300
@@ -4835,13 +4835,15 @@
  EXISTS (SELECT 1 FROM ft4 WHERE ft4.c2 = ft2.c2))
   AND ft2.c1 > 900
   ORDER BY ft2.c1 LIMIT 10;
-   
   
QUERY PLAN  

 

- Foreign Scan
+   
  QUERY 
PLAN
 
+
+ Limit
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8
-   Relations: (public.ft2) SEMI JOIN ((public.ft2 ft2_1) SEMI JOIN 
(public.ft4))
-   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, 
r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" > 900)) AND EXISTS (SELECT NULL FROM 
"S 1"."T 1" r3 WHERE ((r1."C 1" = r3."C 1")) AND EXISTS (SELECT NULL FROM "S 
1"."T 3" r4 WHERE ((r3.c2 = r4.c2 ORDER BY r1."C 1" ASC NULLS LAST LIMIT 
10::bigint
-(4 rows)
+   ->  Foreign Scan
+ Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8
+ Relations: (public.ft2) SEMI JOIN ((public.ft2 ft2_1) SEMI JOIN 
(public.ft4))
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, 
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" > 900)) AND EXISTS (SELECT 
NULL FROM "S 1"."T 1" r3 WHERE ((r1."C 1" = r3."C 1")) AND EXISTS (SELECT NULL 
FROM "S 1"."T 3" r4 WHERE ((r3.c2 = r4.c2 ORDER BY r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT ft2.* FROM ft2 WHERE
   c1 = ANY (
@@ -4871,13 +4873,20 @@
  EXISTS (SELECT 1 FROM ft4 WHERE ft4.c2 = ft2.c2) AND c1 = ftupper.c1 )
   AND ftupper.c1 > 900
   ORDER BY ftupper.c1 LIMIT 10;
-   
   
QUERY PLAN  

 
-

Re: Move wal_buffers_full to WalUsage (and report it in pgss/explain)

2025-02-11 Thread Ilia Evdokimov


Hi,

Thank you for your work!

1. Perhaps In EXPLAIN you forget to check that usage->wal_buffers_full > 0:

if ((usage->wal_records > 0) || (usage->wal_fpi > 0) || 
(usage->wal_bytes > 0))



2. I have a small suggestion for pg_stat_statements: would it make sense 
to move wal_buffers_full next to wal_records, wal_fpi and wal_bytes? 
This way, all WAL-related information would be grouped together.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: 2025-02-13 release announcement draft

2025-02-10 Thread Ilia Evdokimov




On 11.02.2025 03:09, Jonathan S. Katz wrote:

Hi,

Attached is a draft of the 2025-02-13 update release announcement. 
Please provide feedback no later than 2025-02-13 0:00 UTC.


Thanks,

Jonathan



you may simply shutdown PostgreSQL
you may simply shut down PostgreSQL

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-02-12 Thread Ilia Evdokimov




On 12.02.2025 11:54, Andrei Lepikhov wrote:
May we agree on a more general formula to print at least two 
meaningful digits if we have a fractional part?


Examples:
- actual rows = 2, nloops = 2 -> rows = 1
- actual rows = 9, nloops = 5 -> rows = 1.8
- actual rows = 101, nloops = 100 -> rows = 1.0
- actual rows = 99, nloops = 100 -> rows = 0.99

It may guarantee that an EXPLAIN exposes most of the data passed the 
node, enough to tangle it with actual timing and tuples at the upper 
levels of the query.


I think the idea of keeping two significant digits after the decimal 
point is quite reasonable. The thing is, rows=0.01 or something 
similar can only occur when loops is quite large. If we show the order 
of magnitude in rows, it will be easier for the user to estimate the 
order of total rows. For example, if we see this:


rows=0.56 loops=4718040

the user can quickler approximate the order of total rows for analyzing 
the upper levels of the query.


However, keep in mind that I am against using the E notation, as many 
users have mentioned that they are not mathematicians and are not 
familiar with the concept of "E".







I suggest that when thinking about what to change here,
you start by considering how you'd adjust the docs at
https://www.postgresql.org/docs/devel/using-explain.html
to explain the new behavior.  If you can't explain it
clearly for users, then maybe it's not such a great idea.

Agree


So do I. Firstly, I'll think how to explain it.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

[PATCH] Fix incorrect range in pg_regress comment

2025-02-02 Thread Ilia Evdokimov


Hi hackers,

I noticed that a comment in pg_regress incorrectly states that 
alternative output files can be named filename{_i}.out with 0 < i <= 9. 
However, the actual valid range is 0 <= i <= 9. This patch corrects the 
comment.


The fix is trivial but ensures that the documentation in the code 
accurately reflects the behavior of pg_regress.


Attached is a small patch correcting this.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d2ef1b2ff2f42bc38eb2d2d87201a2822d53d8b5 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sun, 2 Feb 2025 21:35:10 +0300
Subject: [PATCH v1] Fix incorrect range in pg_regress comment

The comment in pg_regress incorrectly stated that alternative
output files could be named test{_i}.out with 0 < i <= 9.
However, the valid range is actually 0 <= i <= 9.
---
 src/test/regress/pg_regress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 61a234ae21..5d85dcc62f 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1335,7 +1335,7 @@ make_directory(const char *dir)
 }
 
 /*
- * In: filename.ext, Return: filename_i.ext, where 0 < i <= 9
+ * In: filename.ext, Return: filename_i.ext, where 0 <= i <= 9
  */
 static char *
 get_alternative_expectfile(const char *expectfile, int i)
-- 
2.34.1

Re: Sample rate added to pg_stat_statements

2024-12-10 Thread Ilia Evdokimov


Hi everyone,

I attached previous sampling patch for pg_stat_statements (v4). 
Suggestions like sampling based on execution time remain unfeasible, as 
pg_stat_statements can track query during query planning, not execution.


To evaluate the implementation, I ran a benchmark creating 1,000 random 
tables and executing randomized JOIN queries  on a small machine. When 
pg_stat_statements enabled, performance decreases, but reducing the 
sampling rate helps mitigate the impact and improves performance.


I’d be interested in hearing your new thoughts. Are there areas where 
this patch could be improved, or other ideas worth exploring?


--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC.
From 68f5451019b261bf03a222f5a05ac57cd0fb9a24 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 21 Nov 2024 11:24:03 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/pg_stat_statements.c   | 23 ++-
 doc/src/sgml/pgstatstatements.sgml| 18 +++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 49c657b3e0..d06e0d8a44 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -49,6 +49,7 @@
 
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
@@ -294,12 +295,16 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
 
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 #define pgss_enabled(level) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled)
 
 #define record_gc_qtexts() \
 	do { \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to process.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -835,6 +853,9 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 	if (prev_post_parse_analyze_hook)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
+	if (nesting_level == 0)
+		current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);
+
 	/* Safety check... */
 	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
 		return;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..d06349d097 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,24 @@
 

 
+   
+
+ pg_stat_statements.sample_rate (real)
+ 
+  pg_stat_statements.sample_rate configuration parameter
+ 
+
+
+
+ 
+  pg_stat_statements.sample_rate causes pg_stat_statements to only
+  track a fraction of the statements in each session.  The default is 1,
+  meaning track all the queries. In case of nested statements,
+  either all will be tracked or none. Only superusers can change this setting.
+ 
+
+   
+

 
  pg_stat_statements.save (boolean)
-- 
2.34.1

Re: Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

2024-12-10 Thread Ilia Evdokimov



On 09.12.2024 16:10, Ilia Evdokimov wrote:

Hi hackers,

The repeated use of the number 300 in the ANALYZE-related code creates 
redundancy and relies on scattered, sometimes unclear, comments to 
explain its purpose. This can make the code harder to understand, 
especially for new contributors who might not immediately understand 
its significance. To address this, I propose introducing a macro 
STATS_MIN_ROWS to represent this value and consolidating its 
explanation in a single place, making the code more consistent and 
readable.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Hi everyone,

Currently, the value 300 is used as the basis for determining the number 
of rows sampled during ANALYZE, both for single-column and extended 
statistics. While this value has a well-established rationale for 
single-column statistics, its suitability for extended statistics 
remains uncertain, as no specific research has confirmed that this is an 
optimal choice for them. To better reflect this distinction, I propose 
introducing two macros: STATS_MIN_ROWS for single-column statistics and 
EXT_STATS_MIN_ROWS for extended statistics.


This change separates the concerns of single-column and extended 
statistics sampling, making the code more explicit and easier to adapt 
if future research suggests a different approach for extended 
statistics. The values remain the same for now, but the introduction of 
distinct macros improves clarity and prepares the codebase for potential 
refinements.


Does this seem like a reasonable approach to handling these differences?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 164535616040447207c36ea47c52b206f4d77dcf Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 10 Dec 2024 16:25:29 +0300
Subject: [PATCH v2] Define macros for minimum rows of stats in ANALYZE

This introduces two macros, STATS_MIN_ROWS and EXT_STATS_MIN_ROWS,
to represent the default minimum number of rows sampled in ANALYZE.
STATS_MIN_ROWS is used for single-column statistics,
while EXT_STATS_MIN_ROWS is intended for extended statistics.
Both macros replace the hardcoded value of 300, improving clarity.
---
 src/backend/commands/analyze.c| 31 +--
 src/backend/statistics/extended_stats.c   |  2 +-
 src/backend/tsearch/ts_typanalyze.c   |  4 +--
 src/backend/utils/adt/rangetypes_typanalyze.c |  7 ++---
 .../statistics/extended_stats_internal.h  | 11 +++
 src/include/statistics/statistics.h   | 23 ++
 6 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9a56de2282..e358b0d828 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1897,42 +1897,25 @@ std_typanalyze(VacAttrStats *stats)
 	{
 		/* Seems to be a scalar datatype */
 		stats->compute_stats = compute_scalar_stats;
-		/*
-		 * The following choice of minrows is based on the paper
-		 * "Random sampling for histogram construction: how much is enough?"
-		 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
-		 * Proceedings of ACM SIGMOD International Conference on Management
-		 * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
-		 * says that for table size n, histogram size k, maximum relative
-		 * error in bin size f, and error probability gamma, the minimum
-		 * random sample size is
-		 *		r = 4 * k * ln(2*n/gamma) / f^2
-		 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
-		 *		r = 305.82 * k
-		 * Note that because of the log function, the dependence on n is
-		 * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
-		 * bin size error with probability 0.99.  So there's no real need to
-		 * scale for n, which is a good thing because we don't necessarily
-		 * know it at this point.
-		 *
-		 */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else if (OidIsValid(eqopr))
 	{
 		/* We can still recognize distinct values */
 		stats->compute_stats = compute_distinct_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 	else
 	{
 		/* Can't do much but the trivial stuff */
 		stats->compute_stats = compute_trivial_stats;
-		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
 	}
 
+	/*
+	 * For the scalar types, STATS_MIN_ROWS is derived from research on
+	 * optimal sample sizes for histograms. For other cases,
+	 * we assume the same value.
+	 */
+	stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
+
 	return true;
 }
 
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 99fdf208db..8cd024b5ce 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -320,7 +320,7 @@ ComputeExtStatisticsRow

Re: explain analyze rows=%.0f

2024-12-12 Thread Ilia Evdokimov



On 01.08.2023 23:29, Daniel Gustafsson wrote:

On 3 Jul 2023, at 18:34, Daniel Gustafsson  wrote:


On 8 Jun 2023, at 19:49, Ibrar Ahmed  wrote:
On Mon, Mar 20, 2023 at 7:56 PM Gregory Stark (as CFM) mailto:stark@gmail.com>> wrote:
This patch was marked Returned with Feedback and then later Waiting on
Author. And it hasn't had any updates since January. What is the state
on this feedback? If it's already done we can set the patch to Ready
for Commit and if not do you disagree with the proposed changes?

If there is a consensus to modify the test cases' output, I am willing to
make the necessary changes and adjust the patch accordingly. However,
if there is a preference to keep the output of certain test cases unchanged,
I can rebase and modify the patch accordingly to accommodate those preferences.

As there hasn't been any other comments I suggest updating your patch to
address Tom's comments to see if we can make progress here.

Since there hasn't been any updates here, and the thread has been stalled, I'm
marking this returned with feedback.  Please feel free to resubmit a version of
the patch addressing comments to a future CF.

--
Daniel Gustafsson





Hi everybody,

When the total number of returned tuples is less than the number of 
loops currently shows 'rows = 0'. This can mislead users into thinking 
that no rows were returned at all, even though some might have appeared 
occasionally. For example, if there is 1 tuple over 100 loops, the 
average is 0.01 rows per loop, but as displayed it simply looks like zero.


To clarify this situation, it has been suggested that display rows with 
two decimal places in scenarios where 'loops > 1 && ntuples < loops'. In 
other words, show something like 'rows = 0.01' instead of 'rows=0'. This 
minor change would make it evident that rows did occur, just very 
infrequently.


For all other cases, the current formatting would remain the same. This 
approach aims to provide a more nuanced understanding of query execution 
behavior without introducing unnecessary complexity or false precision.


I would appreciate any thoughts or feedback on this proposal.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a201ed3082..b3f7fa8743 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1981,14 +1981,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
+			appendStringInfo(es->str, " (actual");
 			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 		}
 		else
 		{
@@ -1999,8 +1999,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
 	 3, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 2, es);
+			}
+			else
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
 		}
 	}
 	else if (es->analyze)
@@ -2052,14 +2060,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
 ExplainIndentText(es);
+appendStringInfo(es->str, "actual");
 if (es->timing)
-	appendStringInfo(es->str,
-	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-	 startup_ms, total_ms, rows, nloops);
+	appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+if (nloops > 1 && planstate->instrument->ntuples < nloops)
+	appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 else
-	appendStringInfo(es->str,
-	 "actual rows=%.0f loops=%.0f\n",
-	 rows, nloops);
+	appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 			}
 			else
 			{
@@ -2070,8 +2078

Re: explain analyze rows=%.0f

2025-01-11 Thread Ilia Evdokimov



On 11.01.2025 17:02, Guillaume Lelarge wrote:
Le sam. 11 janv. 2025 à 12:10, Ilia Evdokimov 
 a écrit :



On 11.01.2025 12:15, Guillaume Lelarge wrote:



Thanks for your patch, this looks like a very interesting feature
that I'd like to see in a future release.

It did a quick run: patch OK, make OK, make install OK, but make
check fails quite a lot on partition_prune.sql.

I guess it would need some work on partition_prune.sql tests and
perhaps also on the docs.

Thanks again.


-- 
Guillaume.



Yes, certainly. I have fixed partition_prune.sql. In the
documentation example for EXPLAIN ANALYZE where loops is greater
than one, I updated how 'rows' and 'loops' values are displayed so
they appear as decimal fractions with two digits after the decimal
point.

I attached fixed patch.

Any suggestions?


Thanks for the updated patch.

"make check" is happy, so no more suggestions, except you should 
register it in the next commitfest.



--
Guillaume.



Done!

https://commitfest.postgresql.org/52/5501/

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-01-11 Thread Ilia Evdokimov



On 11.01.2025 12:15, Guillaume Lelarge wrote:



Thanks for your patch, this looks like a very interesting feature that 
I'd like to see in a future release.


It did a quick run: patch OK, make OK, make install OK, but make check 
fails quite a lot on partition_prune.sql.


I guess it would need some work on partition_prune.sql tests and 
perhaps also on the docs.


Thanks again.


--
Guillaume.



Yes, certainly. I have fixed partition_prune.sql. In the documentation 
example for EXPLAIN ANALYZE where loops is greater than one, I updated 
how 'rows' and 'loops' values are displayed so they appear as decimal 
fractions with two digits after the decimal point.


I attached fixed patch.

Any suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From a934fbd278350edf7a3a3e7fe620a0027ceb2580 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Sat, 11 Jan 2025 13:59:34 +0300
Subject: [PATCH] Clarify display of rows and loops as decimal fractions

When the average number of rows is small compared to the number of loops,
both rows and loops are displayed as decimal fractions with two digits
after the decimal point in EXPLAIN ANALYZE.
---
 doc/src/sgml/perform.sgml |  2 +-
 src/backend/commands/explain.c| 49 +--
 src/test/regress/expected/partition_prune.out | 10 ++--
 src/test/regress/sql/partition_prune.sql  |  2 +-
 4 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..da7a19f4b8 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -730,7 +730,7 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
  ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10 loops=1)
Index Cond: (unique1 < 10)
Buffers: shared hit=2
-   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10)
+   ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1.00 loops=10.00)
  Index Cond: (unique2 = t1.unique2)
  Buffers: shared hit=24 read=6
  Planning:
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..200294b756 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1981,14 +1981,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
+			appendStringInfo(es->str, " (actual");
 			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 		}
 		else
 		{
@@ -1999,8 +1999,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
 	 3, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 2, es);
+			}
+			else
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
 		}
 	}
 	else if (es->analyze)
@@ -2052,14 +2060,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
 ExplainIndentText(es);
+appendStringInfo(es->str, "actual");
 if (es->timing)
-	appendStringInfo(es->str,
-	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-	 startup_ms, total_ms, rows, nloops);
+	appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+if (nloops > 1 && planstate->instrument->ntuples < nloops)
+	appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 else
-	appendStringInfo(es->str,
-	 "actual rows=%.0f loops=%.0f\n",
-	 rows, nloops);
+	appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 			}
 			else
 			{
@@

Re: Sample rate added to pg_stat_statements

2025-01-09 Thread Ilia Evdokimov




On 09.01.2025 22:13, Alena Rybakina wrote:

Hi! Thank you for the work with this subject.

I looked at your patch and noticed that this part of the code is 
repeated several times:


if (nesting_level == 0)
    {
    if (!IsParallelWorker())
    current_query_sampled = 
pg_prng_double(&pg_global_prng_state) < pgss_sample_rate;

    else
    current_query_sampled = false;

    }

I think you should put this in a function like 
update_current_query_sampled. I've attached a diff file with the changes.




Agree, thanks.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-01-09 Thread Ilia Evdokimov




On 22.11.2024 09:08, Alexander Korotkov wrote:

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier  wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.  This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans).  The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck.  And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread.  But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold.  The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

--
Regards,
Alexander Korotkov
Supabase




BTW, since we're performing sampling outside of 
pgss_post_parse_analyze(), I'd like to reconsider Alexander's 
proposal—to sample only those queries that have an execution time below 
a specified threshold, as defined in the hook where execution time is 
recorded. Queries exceeding this threshold would not be sampled. 
Certainly I'll fix all Andrey's comments.


What are your thoughts on this approach?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-01-22 Thread Ilia Evdokimov



On 20.01.2025 17:20, Ilia Evdokimov wrote:


On 15.01.2025 20:16, Sami Imseih wrote:
Probably, but first I suggest benchmarking with sampling applied to 
all queries. If the results are good, we can later filter certain 
queries based on different characteristics.

Absolutely. The benchmark numbers to justify this feature are
the next step. Thanks for your work on this!

Regards,

Sami


Hi hackers,

I’d like to share the results of my benchmark.

To stress the spinlock in pg_stat_statements while incrementing 
counters for popular entries, it's important to use the same query 
repeatedly. To avoid overloading pgss with normalization, the queries 
should not contain constants. I found that using the query 'SELECT 
now()' works best for this purpose.


I ran the benchmark on a machine with 48 CPUs, which may not be 
sufficient to fully test sampling, but I was able to achieve around 
1.5 million TPS using 'SELECT now()'. To load the CPUs to about 
85-90%, I ran pgbench with -c 45 -j 45, using a custom 'SELECT now()' 
in pgbench_script.sql file containing 'SELECT now()'. The benchmark 
was conducted on a newly created 'pgbench' database, , with processes 
like autovacuum, fsync, and checkpoints disabled(e.g., checkpoint 
_timeout = '24h').


I tested various 'sample_rate' values and compared the performance 
with pgss both enabled and disabled. The detailed results are attached 
in the 'sample_rate.txt' file, along with pgbench progress reports 
every 10 seconds.



pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench

sample_rate | num of transactions | lat | stddev | tps

    1.0 |      1207127.572580 |0.037|  0.030 | 241400836
    0.8 |      1403551.516338 |0.032|  0.031 | 280673286
    0.5 |      1658596.614064 |0.027|  0.012 | 331679344
    0.2 |      1757747.509476 |0.025|  0.008 | 351507156
    0.0 |      1760055.986259 |0.025|  0.008 | 351965559
   pgss off |      1828743.791205 |0.024|  0.008 | 365703762


If anyone has the capability to run this benchmark on machines with 
more CPUs or with different queries, it would be nice. I’d appreciate 
any suggestions or feedback.


--.
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Hi hackers,


I'd like to share the results of a new benchmark I conducted using the 
pgbench tool with the -M prepared -S flags!



Previously, I shared a benchmark focused on extreme high-load scenarios, 
but now I’ve conducted a more realistic test case based on Alexander’s 
suggestion [0].


Machine Characteristics:

 * 48 CPUs
 * 374 GB RAM
 * 3 TB Disk

To avoid interference from background processes, I disabled autovacuum 
and synchronous operations using the following settings in 
postgresql.auto.conf:

    autovacuum = off
    fsync = off
    checkpoint_timeout = '24h'
    shared_preload_libraries = 'pg_stat_statements'

Create a test database:
    createdb pgbench
    pgbench -i -s 3000 pgbench

If pg_stat_statements is enabled, reset its hash table before starting 
the benchmark:

    psql -c 'SELECT pg_stat_statements_reset()'

Run the benchmark with the following command:
    pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench


Here are the results with and without pg_stat_statements, varying the 
sample_rate parameter:


sample_rate |                 tps | lat | stddev | num of transactions
   1.0 |      1299070.105018 |0.035|  0.055 | 259781895
   0.8 |      1393327.873249 |0.033|  0.039 | 278630669
   0.5 |      1420303.856480 |0.032|  0.037 | 284023396
   0.2 |      1432482.672301 |0.032|  0.037 | 286461834
   0.0 |      1760055.986259 |0.032|  0.036 | 290169584
  pgss off |      1460920.805259 |0.031|  0.036 | 292144256

When pg_stat_statements is enabled, TPS decreases by about 10%. However, 
by leveraging the new sampling feature (sample_rate), it is possible to 
significantly mitigate this overhead. For detailed benchmark results 
with the '-M prepared -S' flags, please see the attached file 
'sample_rate_prepared.txt'



P.S. Previously, I shared results under extreme high-load conditions 
using the query 'SELECT now()'. For clarity, I’m rewriting those results 
for better readability:


sample_rate |                 tps | lat | stddev | num of transactions
   1.0 |      1207127.572580 |0.037|  0.030 | 241400836
   0.8 |      1403551.516338 |0.032|  0.031 | 280673286
   0.5 |      1658596.614064 |0.027|  0.012 | 331679344
   0.2 |      1757747.509476 |0.025|  0.008 | 351507156
   0.0 |      1760055.986259 |0.025|  0.008 | 351965559
  pgss off |      1828743.791205 |0.024|  0.008 | 365703762

Here, we observed a significant ~33% decrease in TPS when pgss was 
enabled without sampling.



Any thoughts?


[0]: 
https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail

Re: Sample rate added to pg_stat_statements

2025-01-23 Thread Ilia Evdokimov



On 23.01.2025 08:21, Andrey Borodin wrote:



There’s a typo in the commit message (ratio instead of rate). Besides this the 
patch looks ready for committer.


Best regards, Andrey Borodin.



Fixed. Thank you for review!

I noticed that the code has not enough comments, so I added additional 
ones to improve clarity. Additionally, I moved the 
update_current_query_sampled() function to the end of the file, as 
keeping it between hooks didn’t seem appropriate to me.


All these changes are included in the updated patch (v12), which I have 
attached.


The patch is part of the current 2025-01 Commitfest: 
https://commitfest.postgresql.org/51/5390/


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From ddbc6af2af511ff342b183cb13a9027edadc0ad3 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 23 Jan 2025 11:31:57 +0300
Subject: [PATCH v1] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 76 +++
 .../pg_stat_statements/pg_stat_statements.c   | 73 +++---
 contrib/pg_stat_statements/sql/select.sql | 21 +
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 179 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..558d93fb46 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,82 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+set pg_stat_statements.sample_rate = 0.0;
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+---+---
+(0 rows)
+
+set pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(2 rows)
+
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 --
 -- queries with locking clauses
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..92ba954d41 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -294,6 +295,10 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1.0; /* fraction of statements to track */
+
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 
 #define pgss_enabled(level) \
@@ -373,6 +378,7 @@ static char *generate_normalized_query(JumbleState *jstate, const char *query,
 static v

Re: Sample rate added to pg_stat_statements

2025-01-20 Thread Ilia Evdokimov



On 15.01.2025 20:16, Sami Imseih wrote:

Probably, but first I suggest benchmarking with sampling applied to all 
queries. If the results are good, we can later filter certain queries based on 
different characteristics.

Absolutely. The benchmark numbers to justify this feature are
the next step. Thanks for your work on this!

Regards,

Sami


Hi hackers,

I’d like to share the results of my benchmark.

To stress the spinlock in pg_stat_statements while incrementing counters 
for popular entries, it's important to use the same query repeatedly. To 
avoid overloading pgss with normalization, the queries should not 
contain constants. I found that using the query 'SELECT now()' works 
best for this purpose.


I ran the benchmark on a machine with 48 CPUs, which may not be 
sufficient to fully test sampling, but I was able to achieve around 1.5 
million TPS using 'SELECT now()'. To load the CPUs to about 85-90%, I 
ran pgbench with -c 45 -j 45, using a custom 'SELECT now()' in 
pgbench_script.sql file containing 'SELECT now()'. The benchmark was 
conducted on a newly created 'pgbench' database, , with processes like 
autovacuum, fsync, and checkpoints disabled(e.g., checkpoint _timeout = 
'24h').


I tested various 'sample_rate' values and compared the performance with 
pgss both enabled and disabled. The detailed results are attached in the 
'sample_rate.txt' file, along with pgbench progress reports every 10 
seconds.



pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench

sample_rate | num of transactions | lat | stddev | tps

    1.0 |      1207127.572580 |0.037|  0.030 | 241400836
    0.8 |      1403551.516338 |0.032|  0.031 | 280673286
    0.5 |      1658596.614064 |0.027|  0.012 | 331679344
    0.2 |      1757747.509476 |0.025|  0.008 | 351507156
    0.0 |      1760055.986259 |0.025|  0.008 | 351965559
   pgss off |      1828743.791205 |0.024|  0.008 | 365703762


If anyone has the capability to run this benchmark on machines with more 
CPUs or with different queries, it would be nice. I’d appreciate any 
suggestions or feedback.


--.
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
pg_stat_statements.sample_rate = 1.0

cpu = 85%

pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 1200445.1 tps, lat 0.037 ms stddev 0.038, 0 failed
progress: 20.0 s, 1232634.5 tps, lat 0.036 ms stddev 0.030, 0 failed
progress: 30.0 s, 1209760.7 tps, lat 0.037 ms stddev 0.027, 0 failed
progress: 40.0 s, 1204625.5 tps, lat 0.037 ms stddev 0.026, 0 failed
progress: 50.0 s, 1206608.7 tps, lat 0.037 ms stddev 0.027, 0 failed
progress: 60.0 s, 1214047.9 tps, lat 0.037 ms stddev 0.034, 0 failed
progress: 70.0 s, 1222126.5 tps, lat 0.037 ms stddev 0.036, 0 failed
progress: 80.0 s, 1198452.9 tps, lat 0.037 ms stddev 0.030, 0 failed
progress: 90.0 s, 1198871.7 tps, lat 0.037 ms stddev 0.027, 0 failed
progress: 100.0 s, 1200786.9 tps, lat 0.037 ms stddev 0.028, 0 failed
progress: 110.0 s, 1200419.5 tps, lat 0.037 ms stddev 0.030, 0 failed
progress: 120.0 s, 1208394.4 tps, lat 0.037 ms stddev 0.030, 0 failed
progress: 130.0 s, 1199204.3 tps, lat 0.037 ms stddev 0.028, 0 failed
progress: 140.0 s, 1213919.6 tps, lat 0.037 ms stddev 0.037, 0 failed
progress: 150.0 s, 1195670.1 tps, lat 0.037 ms stddev 0.028, 0 failed
progress: 160.0 s, 1190519.2 tps, lat 0.038 ms stddev 0.029, 0 failed
progress: 170.0 s, 1213450.1 tps, lat 0.037 ms stddev 0.027, 0 failed
progress: 180.0 s, 1220433.0 tps, lat 0.037 ms stddev 0.029, 0 failed
progress: 190.0 s, 1199628.7 tps, lat 0.037 ms stddev 0.028, 0 failed
transaction type: pgbench_script.sql
scaling factor: 1
query mode: simple
number of clients: 45
number of threads: 45
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 241400836
number of failed transactions: 0 (0.000%)
latency average = 0.037 ms
latency stddev = 0.030 ms
initial connection time = 25.051 ms
tps = 1207127.572580 (without initial connection time)


==

pg_stat_statements.sample_rate = 0.8

cpu = 90%

pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 1429922.8 tps, lat 0.031 ms stddev 0.028, 0 failed
progress: 20.0 s, 1200380.2 tps, lat 0.037 ms stddev 0.029, 0 failed
progress: 30.0 s, 1198102.6 tps, lat 0.037 ms stddev 0.027, 0 failed
progress: 40.0 s, 1225912.9 tps, lat 0.037 ms stddev 0.028, 0 failed
progress: 50.0 s, 1234056.2 tps, lat 0.036 ms stddev 0.027, 0 failed
progress: 60.0 s, 1234894.6 tps, lat 0.036 ms stddev 0.027, 0 failed
progress: 70.0 s, 1235032.9 tps, lat 0.036 ms stddev 0.027, 0 failed
progress: 80.0 s, 1548607.9 tps, lat 0.029 ms stddev 0.035, 0 failed
progress: 90.0 s, 1553966.9 tps, lat 0.029 ms stddev 0.031, 0 failed
progress: 100

Re: Sample rate added to pg_stat_statements

2025-01-27 Thread Ilia Evdokimov



On 28.01.2025 02:00, Sami Imseih wrote:

Hi,

I have a few comments about the patch.

1/
tests are missing for pg_stat_statements.track=all and with a sample
rate < 1 applied.
Can we add tests for this case?



I think I’ve come up with a good idea for implementing this. We can 
create a new file, sampling.sql, for regression tests, where we move 
written tests in select.out and check test with setting 
"pg_stat_statements.track = 'all'" and "pg_stat_statements.sample_rate = 
0.5". Additionally, we can create two files in the 'expected' directory: 
sampling.out and sampling_1.out where we can verify either all nested 
statements are tracked or none of them.





2/

This declaration:

+static bool current_query_sampled = false;

should be moved near the declaration of nesting_level,
the other local variable.



+1





3/
I do not really like the calls to update_current_query_sampled()
the way there are, and I think the pgss_enabled should encapsulate
the new sample rate check as well.

My thoughts are:

1/ change the name of the function from update_current_query_sampled
to current_query_sampled

Also, the static bool should be called is_query_sampled or something like that.

The function should also allow you to skip the sample rate check, such as if
called from pgss_post_parse_analyze.

We can also remove the IsParallelWorker() check in this case, since that
is done in pgss_enabled.

something like this is what I am thinking:

static bool
current_query_sampled(bool skip)
{
if (skip)
return true;

 if (nesting_level == 0)
 {
is_query_sampled = pgss_sample_rate != 0.0 &&
(pgss_sample_rate == 1.0 ||

pg_prng_double(&pg_global_prng_state) <= pgss_sample_rate);
 }

return is_query_sampled;
}

#define pgss_enabled(level, skip_sampling_check) \
 (!IsParallelWorker() && \
 (pgss_track == PGSS_TRACK_ALL || \
(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
current_query_sampled(skip_sampling_check))

What do you think?



I agree with all of these changes; they make the code more readable. 
Adding comments to the current_query_sampled() function will make it 
even clearer.





4/ Now we have pg_stat_statements.track = "off" which is effectively
the same thing as pg_stat_statements.sample_rate = "0". Does this need
to be called out in docs?



+1, because we have the same thing in log_statement_sample_rate.


All the changes mentioned above are included in the v13 patch. Since the 
patch status is 'Ready for Committer,' I believe it is now better for 
upstream inclusion, with improved details in tests and documentation. Do 
you have any further suggestions?


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 149688fe3bccce25f8d01dfe8c01444555dce03c Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 28 Jan 2025 05:31:20 +0300
Subject: [PATCH] Allow setting sample rate for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile   |  2 +-
 .../pg_stat_statements/expected/sampling.out  | 99 +++
 .../expected/sampling_1.out   | 97 ++
 .../pg_stat_statements/pg_stat_statements.c   | 59 +--
 contrib/pg_stat_statements/sql/sampling.sql   | 34 +++
 doc/src/sgml/pgstatstatements.sgml| 19 
 6 files changed, 302 insertions(+), 8 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/sampling.out
 create mode 100644 contrib/pg_stat_statements/expected/sampling_1.out
 create mode 100644 contrib/pg_stat_statements/sql/sampling.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..b70bdfaf2d 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup oldextversions
+	parallel sampling cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/sampling.out b/contrib/pg_stat_statements/expected/sampling.out
new file mode 100644
index 00..6e73a0a318
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/sampling.out
@@ -0,0 +1,99 @@
+--
+-- sample statements
+--
+SET pg_stat_statements.

Re: Sample rate added to pg_stat_statements

2025-01-28 Thread Ilia Evdokimov



On 28.01.2025 20:21, Sami Imseih wrote:

All the changes mentioned above are included in the v13 patch. Since the
patch status is 'Ready for Committer,' I believe it is now better for
upstream inclusion, with improved details in tests and documentation. Do
you have any further suggestions?

I am not quite clear on the sample_1.out. I do like the idea of separating
the sample tests, but I was thinking of something a bit more simple.
What do you think of my attached, sampling.sql, test? It tests sample
rate in both
simple and extended query protocols and for both top level and
nested levels?



That sounds great! I've added your sample.sql file to my v14 
patch. However, I was focused on testing sample_rate values between 0 
and 1. The approach I came up with was using the sample{_1}.out files.  
I’ve removed the test involving those files for now, but if the 
committer prefers to keep it, I can reintroduce them.






If anyone has the capability to run this benchmark on machines with more
CPUs or with different queries, it would be nice. I’d appreciate any
suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-
   11107  SpinDelay
9568  CPU
 929  ClientRead
  13  DataFileRead
   3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-
   12079  CPU
4781  SpinDelay
2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-
   13253  CPU
3378  ClientRead
 174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-
   13397  CPU
3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-
   13106  CPU
3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-
1782  CPU
 560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-
1736  CPU
 554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-
1741  CPU
 648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-
1702  CPU
 576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-
1638  CPU
 574  ClientRead

Regards,

Sami



Thank you so much for benchmarking this on a pretty large machine with a 
large number of CPUs. The results look fantastic, and I truly appreciate 
your effort.


BWT, I realized that the 'sampling' test needs to be added not only to 
the Makefile but also to meson.build. I've included that in the v14 patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 7c9d45325e29a65259740d5255d39f9f57ee6fba Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 28 Jan 2025 23:04:34 +0300
Subject: [PATCH] Allow setting sample rate for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/sampling.out  | 174 ++
 contrib/pg_stat_statements/meson.build|   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  59 +-
 contrib/pg_stat_statements/sql/sampling.sql   |  50 +
 doc/src/sgml/pgstatstatements.sgml|  19 ++
 6 files changed, 297 insertions(+), 8 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/sampling.out
 create mode 100644 contrib/pg_stat_statements/sql/sampling.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..b70bdfaf2d 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcd

Re: Sample rate added to pg_stat_statements

2025-01-30 Thread Ilia Evdokimov




On 29.01.2025 21:52, Ilia Evdokimov wrote:



... I also attached the benchmark.sh
script used to generate the output.



In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs, 
we should determine the CPU count at which it becomes. This will help 
us understand the scale of the problem. Does this make sense, or are 
there really no real workloads where the same query runs on more than 
32 CPUs, and we've been trying to solve a non-existent problem?



I ran the same benchmark on 48 CPUs for -c 48 -j 20 for objectivity.

### 48 connections
pgbench -c48 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 643251.640175 (without initial connection time)
waits
-
    932  ClientRead
    911  CPU
 44  SpinDelay

sample_rate = .75
tps = 653946.777122 (without initial connection time)
waits
-
    939  CPU
    875  ClientRead
  3  SpinDelay

sample_rate = .5
tps = 651654.348463 (without initial connection time)
waits
-
    932  ClientRead
    841  CPU

sample_rate = .25
tps = 652668.807245 (without initial connection time)
waits
-
    910  ClientRead
    860  CPU

sample_rate = 0
tps = 659111.347019 (without initial connection time)
waits
-
    882  ClientRead
    849  CPU


There is a small amount ofSpinDelay, as the user mentioned. However, we 
can identify the threshold where the problem appears.


To summarize the results of all benchmarks, I compiled them into a table:

 CPUs | sample_rate | tps | CPU waits | ClientRead wait | SpinDelay wait
  192 |         1.0 | 484338|  9568 | 929 | 11107
  192 |        0.75 | 909547|  12079 |    2100 | 4781
  192 |         0.5 |1028594| 13253 |    3378 | 174
  192 |        0.25 |1019507| 13397 |    3423 | -
  192 |         0.0 |1015425| 13106 |    3502 | -

   48 |         1.0 | 643251|   911 | 932 | 44
   48 |        0.75 | 653946|   939 | 939 |  3
   48 |         0.5 | 651654|   841 | 932 | -
   48 |        0.25 | 652668|   860 | 910 | -
   48 |         0.0 | 659111|   849 | 882 | -

   32 |         1.0 | 620667|  1782 | 560 | -
   32 |        0.75 | 620667|  1736 | 554 | -
   32 |         0.5 | 624094|  1741 | 648 | -
   32 |        0.25 | 628638|  1702 | 576 | -
   32 |         0.0 | 630483|  1638 | 574 | -

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-01-29 Thread Ilia Evdokimov




On 28.01.2025 23:50, Ilia Evdokimov wrote:




If anyone has the capability to run this benchmark on machines with 
more

CPUs or with different queries, it would be nice. I’d appreciate any
suggestions or feedback.

I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-
   11107  SpinDelay
    9568  CPU
 929  ClientRead
  13  DataFileRead
   3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-
   12079  CPU
    4781  SpinDelay
    2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-
   13253  CPU
    3378  ClientRead
 174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-
   13397  CPU
    3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-
   13106  CPU
    3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-
    1782  CPU
 560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-
    1736  CPU
 554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-
    1741  CPU
 648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-
    1702  CPU
 576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-
    1638  CPU
 574  ClientRead

Regards,

Sami



Thank you so much for benchmarking this on a pretty large machine with 
a large number of CPUs. The results look fantastic, and I truly 
appreciate your effort.


BWT, I realized that the 'sampling' test needs to be added not only to 
the Makefile but also to meson.build. I've included that in the v14 
patch.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs, we 
should determine the CPU count at which it becomes. This will help us 
understand the scale of the problem. Does this make sense, or are there 
really no real workloads where the same query runs on more than 32 CPUs, 
and we've been trying to solve a non-existent problem?


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-01-13 Thread Ilia Evdokimov



On 11.01.2025 14:10, Ilia Evdokimov wrote:



On 11.01.2025 12:15, Guillaume Lelarge wrote:



Thanks for your patch, this looks like a very interesting feature 
that I'd like to see in a future release.


It did a quick run: patch OK, make OK, make install OK, but make 
check fails quite a lot on partition_prune.sql.


I guess it would need some work on partition_prune.sql tests and 
perhaps also on the docs.


Thanks again.


--
Guillaume.



Yes, certainly. I have fixed partition_prune.sql. In the documentation 
example for EXPLAIN ANALYZE where loops is greater than one, I updated 
how 'rows' and 'loops' values are displayed so they appear as decimal 
fractions with two digits after the decimal point.


I attached fixed patch.

Any suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



I guess, it's not ideal to modify the existing example in the 
documentation of the v5 patch because readers wouldn't immediately 
understand why decimal fractions appear there. Instead, I'll add a brief 
note in the documentation clarifying how rows and loops are displayed 
when the average row count is below one.


The changes the of documentation are attached v6 patch.

If you have any other suggestions or different opinions, I'd be happy to 
discuss them.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 08f92c7e11829045014598e1dcc042f3e5a1e1a3 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 13 Jan 2025 23:01:44 +0300
Subject: [PATCH] Clarify display of rows and loops as decimal fractions

When the average number of rows is small compared to the number of loops,
both rows and loops are displayed as decimal fractions with two digits
after the decimal point in EXPLAIN ANALYZE.
---
 doc/src/sgml/perform.sgml |  6 ++-
 src/backend/commands/explain.c| 49 +--
 src/test/regress/expected/partition_prune.out | 10 ++--
 src/test/regress/sql/partition_prune.sql  |  2 +-
 4 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..3f13d17fe9 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -757,7 +757,11 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
 comparable with the way that the cost estimates are shown.  Multiply by
 the loops value to get the total time actually spent in
 the node.  In the above example, we spent a total of 0.030 milliseconds
-executing the index scans on tenk2.
+executing the index scans on tenk2.   If a subplan node
+is executed multiple times and the average number of rows is less than one,
+the rows and loops values are shown as a decimal fraction
+(with two digits after the decimal point) to indicate that some rows
+were actually processed rather than simply rounding down to zero.

 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..200294b756 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1981,14 +1981,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
+			appendStringInfo(es->str, " (actual");
 			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+appendStringInfo(es->str, " time=%.3f..%.3f", startup_ms, total_ms);
+
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops);
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+appendStringInfo(es->str," rows=%.0f loops=%.0f)", rows, nloops);
 		}
 		else
 		{
@@ -1999,8 +1999,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
 	 3, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			if (nloops > 1 && planstate->instrument->ntuples < nloops)
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 2, es);
+			}
+			else
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
 		}
 	}
 	else if (es->analyze)
@@ -2052,14 +2060,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			if (es->format == EXPLAIN_FORMAT_TEXT)
 			{
 ExplainIndentText(es);
+appendStringInfo(es->str, "actual");
 if (es->timing)
-	appendStringInfo(es->str,
-

Re: Sample rate added to pg_stat_statements

2025-01-14 Thread Ilia Evdokimov



On 09.01.2025 22:13, Alena Rybakina wrote:

Hi! Thank you for the work with this subject.

I looked at your patch and noticed that this part of the code is 
repeated several times:


if (nesting_level == 0)
    {
    if (!IsParallelWorker())
    current_query_sampled = 
pg_prng_double(&pg_global_prng_state) < pgss_sample_rate;

    else
    current_query_sampled = false;

    }

I think you should put this in a function like 
update_current_query_sampled. I've attached a diff file with the changes.



There was redundant checking of nesting_level, so I removed one of them. 
The fixed version is included in patch v10.


Alena, Sami – I apologize for not including you in the previous email. 
If you're interested in this approach, I'm open to any suggestions.


[0]: 
https://www.postgresql.org/message-id/1b13d748-5e98-479c-9222-3253a734a038%40tantorlabs.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From e8d9911bd7b1384112773977c1a30a476fc77471 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 14 Jan 2025 12:18:52 +0300
Subject: [PATCH v1] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 76 +++
 .../pg_stat_statements/pg_stat_statements.c   | 46 ++-
 contrib/pg_stat_statements/sql/select.sql | 21 +
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 158 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..558d93fb46 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,82 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+set pg_stat_statements.sample_rate = 0.0;
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+---+---
+(0 rows)
+
+set pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(2 rows)
+
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 --
 -- queries with locking clauses
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..d79402bac4 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -294,6 +295,10 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
+
+/* Is the c

Re: Define STATS_MIN_ROWS for minimum rows of stats in ANALYZE

2025-01-03 Thread Ilia Evdokimov



On 10.12.2024 16:32, Ilia Evdokimov wrote:


On 09.12.2024 16:10, Ilia Evdokimov wrote:

Hi hackers,

The repeated use of the number 300 in the ANALYZE-related code 
creates redundancy and relies on scattered, sometimes unclear, 
comments to explain its purpose. This can make the code harder to 
understand, especially for new contributors who might not immediately 
understand its significance. To address this, I propose introducing a 
macro STATS_MIN_ROWS to represent this value and consolidating its 
explanation in a single place, making the code more consistent and 
readable.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Hi everyone,

Currently, the value 300 is used as the basis for determining the 
number of rows sampled during ANALYZE, both for single-column and 
extended statistics. While this value has a well-established rationale 
for single-column statistics, its suitability for extended statistics 
remains uncertain, as no specific research has confirmed that this is 
an optimal choice for them. To better reflect this distinction, I 
propose introducing two macros: STATS_MIN_ROWS for single-column 
statistics and EXT_STATS_MIN_ROWS for extended statistics.


This change separates the concerns of single-column and extended 
statistics sampling, making the code more explicit and easier to adapt 
if future research suggests a different approach for extended 
statistics. The values remain the same for now, but the introduction 
of distinct macros improves clarity and prepares the codebase for 
potential refinements.


Does this seem like a reasonable approach to handling these differences?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Hi everyone,

In my opinion, it is more appropriate to define||EXT_STATS_MIN_ROWS as 
STATS_MIN_ROWS. I also reverted some of the code comments and rewrote 
others. I attached patch.


Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From d8e38a78857006189a149acc5b36df3d77fe1e40 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 3 Jan 2025 16:32:43 +0300
Subject: [PATCH] Define macros for minimum rows of stats in ANALYZE

This introduces two macros, STATS_MIN_ROWS and EXT_STATS_MIN_ROWS,
to represent the default minimum number of rows sampled in ANALYZE.
STATS_MIN_ROWS is used for single-column statistics,
while EXT_STATS_MIN_ROWS is intended for extended statistics.
Both macros replace the hardcoded value of 300, improving clarity.
---
 src/backend/commands/analyze.c| 26 +++
 src/backend/statistics/extended_stats.c   |  2 +-
 src/backend/tsearch/ts_typanalyze.c   |  5 ++--
 src/backend/utils/adt/rangetypes_typanalyze.c |  5 ++--
 .../statistics/extended_stats_internal.h  | 11 
 src/include/statistics/statistics.h   | 23 
 6 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 2a7769b1fd..4c02e4f18a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1897,40 +1897,22 @@ std_typanalyze(VacAttrStats *stats)
 	{
 		/* Seems to be a scalar datatype */
 		stats->compute_stats = compute_scalar_stats;
-		/*
-		 * The following choice of minrows is based on the paper
-		 * "Random sampling for histogram construction: how much is enough?"
-		 * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
-		 * Proceedings of ACM SIGMOD International Conference on Management
-		 * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
-		 * says that for table size n, histogram size k, maximum relative
-		 * error in bin size f, and error probability gamma, the minimum
-		 * random sample size is
-		 *		r = 4 * k * ln(2*n/gamma) / f^2
-		 * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
-		 *		r = 305.82 * k
-		 * Note that because of the log function, the dependence on n is
-		 * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
-		 * bin size error with probability 0.99.  So there's no real need to
-		 * scale for n, which is a good thing because we don't necessarily
-		 * know it at this point.
-		 *
-		 */
-		stats->minrows = 300 * stats->attstattarget;
+		/* see comment about the choice of minrows in statistics/statistics.h */
+		stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
 	}
 	else if (OidIsValid(eqopr))
 	{
 		/* We can still recognize distinct values */
 		stats->compute_stats = compute_distinct_stats;
 		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
+		stats->minrows = (STATS_MIN_ROWS * stats->attstattarget);
 	}
 	else
 	{
 		/* Can't do much but the trivial stuff */
 		stats->compute_stats = compute_trivial_stats;
 		/* Might as well use the same minrows as above */
-		stats->minrows = 300 * stats->attstattarget;
+		stats->minrows = (STA

Remove unused rel parameter in lookup_var_attr_stats

2025-01-03 Thread Ilia Evdokimov


Hi hackers,

I've attached a small patch that remove unused parameter 'rel' from 
the static function lookup_var_attr_stats() in extended_stats.c


While exploring src/backend/statistics, I noticed that there are several 
other parameters which appear to be unused. However, since the 
discussion around import/export statistics [1] attribute_stats.c is 
still ongoing and many functions from other files are non-static, those 
parameters may yet prove useful. Therefore, I'm only removing the 
parameter from this static function.


Any thoughts?

[1]: 
https://www.postgresql.org/message-id/flat/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From a59af002463c962f153c79202b5f091a24fa6199 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 3 Jan 2025 16:46:51 +0300
Subject: [PATCH] Remove unused rel parameter in lookup_var_attr_stats

The rel parameter in lookup_var_attr_stats is unused, so remove it.
This is a static function, so the parameter can easily be added again if
it's ever needed.
---
 src/backend/statistics/extended_stats.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index aeec3ece41..34dcb535e1 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -74,7 +74,7 @@ typedef struct StatExtEntry
 
 
 static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
-static VacAttrStats **lookup_var_attr_stats(Relation rel, Bitmapset *attrs, List *exprs,
+static VacAttrStats **lookup_var_attr_stats(Bitmapset *attrs, List *exprs,
 			int nvacatts, VacAttrStats **vacatts);
 static void statext_store(Oid statOid, bool inh,
 		  MVNDistinct *ndistinct, MVDependencies *dependencies,
@@ -165,7 +165,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 		 * Check if we can build these stats based on the column analyzed. If
 		 * not, report this fact (except in autovacuum) and move on.
 		 */
-		stats = lookup_var_attr_stats(onerel, stat->columns, stat->exprs,
+		stats = lookup_var_attr_stats(stat->columns, stat->exprs,
 	  natts, vacattrstats);
 		if (!stats)
 		{
@@ -293,7 +293,7 @@ ComputeExtStatisticsRows(Relation onerel,
 		 * analyzed. If not, ignore it (don't report anything, we'll do that
 		 * during the actual build BuildRelationExtStatistics).
 		 */
-		stats = lookup_var_attr_stats(onerel, stat->columns, stat->exprs,
+		stats = lookup_var_attr_stats(stat->columns, stat->exprs,
 	  natts, vacattrstats);
 
 		if (!stats)
@@ -687,7 +687,7 @@ examine_expression(Node *expr, int stattarget)
  * indicate to the caller that the stats should not be built.
  */
 static VacAttrStats **
-lookup_var_attr_stats(Relation rel, Bitmapset *attrs, List *exprs,
+lookup_var_attr_stats(Bitmapset *attrs, List *exprs,
 	  int nvacatts, VacAttrStats **vacatts)
 {
 	int			i = 0;
-- 
2.34.1

Re: Sample rate added to pg_stat_statements

2025-01-15 Thread Ilia Evdokimov



On 15.01.2025 01:07, Sami Imseih wrote:

Alena, Sami – I apologize for not including you in the previous email.
If you're interested in this approach, I'm open to any suggestions.

[0]:
https://www.postgresql.org/message-id/1b13d748-5e98-479c-9222-3253a734a038%40tantorlabs.com

Here are my thoughts on this:

There is good reason to apply sample rate selectively,
but I am not sure if execution time is the way to go. I
would rather apply a sample rate on the most frequently
called queries, since I can gather enough samples
to draw conclusions about performance of the query.
I just don't know if that can be done in a sensible way,
because while we can check the number of calls in the entry,
we will need to do that with a shared lock and spin lock,
which will defeat the purpose of this patch.



Agree. Indeed, we should reduce the load on the spin locks, but we can’t 
check the most popular called queries without inspecting the hash table 
and locking spin locks.






This also got me thinking if maybe we should
allow the user to disable sample rate for
utility statements as those are less frequent
in most workloads and a user may want to capture
all such statements, i.e. DROP, CREATE, etc.

Regards,

Sami



Probably, but first I suggest benchmarking with sampling applied to all 
queries. If the results are good, we can later filter certain queries 
based on different characteristics.



On 06.01.2025 18:57, Andrey M. Borodin wrote:

1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure 
it's a good idea, but still. Also, it uses "<=", not "<".

 xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);


Sorry for the delayed reply. Andrey was right about this 
suggestion—first, it makes the code more readable for others, and 
second, it avoids engaging the PRNG on edge values of 0.0 and 1.0. I’ve 
attached patch v11 with these changes.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 838c7e591a5cf157e88bc82cd8adf549b60ce212 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Wed, 15 Jan 2025 12:37:36 +0300
Subject: [PATCH v1] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 76 +++
 .../pg_stat_statements/pg_stat_statements.c   | 48 +++-
 contrib/pg_stat_statements/sql/select.sql | 21 +
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 160 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..558d93fb46 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,82 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+set pg_stat_statements.sample_rate = 0.0;
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+---+---
+(0 rows)
+
+set pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1

Re: Sample rate added to pg_stat_statements

2025-01-08 Thread Ilia Evdokimov




On 08.01.2025 22:39, Ilia Evdokimov wrote:


After the changes in v9-patch, won’t all the necessary queries now be 
normalized? Since there are no longer any restrictions in the parser 
hook, queries will be normalized as usual, and pgss_planner, 
pgss_ExecutorStart, and ExecutorEnd will simply fetch them from 
'pgss_query_texts.stat' file.


For now, this seems like the simplest approach instead of making 
JumbleState available to other hooks. Moreover, if we do proceed with 
that, we might be able to remove the 'pgss_query_texts.stat' file 
altogether and more other improvements. In my view, if no other 
options arise, we should address making JumbleState available to other 
hooks in a separate thread. If you have any suggestions, I'm always 
open to feedback.


I attached v9 patch with changes and test above.



Unfortunately, these changes do not achieve the intended sampling goal. 
I looked into this more deeply: while the sampled-out queries do not 
appear in pg_stat_statements, an entry is still allocated in the hash 
table after normalization, which, in my view, should not happen when 
sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Sample rate added to pg_stat_statements

2025-01-08 Thread Ilia Evdokimov

ng JumbleState available to other hooks in 
a separate thread. If you have any suggestions, I'm always open to feedback.


I attached v9 patch with changes and test above.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From b40ab06155cac38e57f0551a3858d0574d96865f Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Wed, 8 Jan 2025 22:15:02 +0300
Subject: [PATCH] Allow setting sample ratio for pg_stat_statements

New configuration parameter pg_stat_statements.sample_ratio makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 .../pg_stat_statements/expected/select.out| 76 +++
 .../pg_stat_statements/pg_stat_statements.c   | 54 -
 contrib/pg_stat_statements/sql/select.sql | 21 +
 doc/src/sgml/pgstatstatements.sgml| 18 +
 4 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034..558d93fb46 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -153,6 +153,82 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+--
+-- sample statements
+--
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1 AS "int" | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+set pg_stat_statements.sample_rate = 0.0;
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+---+---
+(0 rows)
+
+set pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+ ?column? 
+--
+1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT $1  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(2 rows)
+
+select pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 --
 -- queries with locking clauses
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..52957c4628 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -294,6 +295,10 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 			 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1;
+
+/* Is the current top-level query to be sampled? */
+static bool current_query_sampled = false;
 
 
 #define pgss_enabled(level) \
@@ -414,6 +419,19 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+			 "Fraction of queries to track.",
+			 NULL,
+			 &pgss_sample_rate,
+			 1.0,
+			 0.0,
+			 1.0,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 			 "Selects which statements are tracked by pg_stat_statements.",
 			 NULL,
@@ -889,12 +907,21 @@ pgss_planner(Query *parse,
 {
 	PlannedStmt *result;
 
+	if (nesting_level == 0)
+	{
+		if (!IsParalle

Re: Sample rate added to pg_stat_statements

2025-01-09 Thread Ilia Evdokimov




On 09.01.2025 05:29, Sami Imseih wrote:

Unfortunately, these changes do not achieve the intended sampling goal.
I looked into this more deeply: while the sampled-out queries do not
appear in pg_stat_statements, an entry is still allocated in the hash
table after normalization, which, in my view, should not happen when
sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.

pg_stat_statements creates entries as "sticky" initially to give them
more time to stay in the hash before the first execution completes.
It's not perfect, but it works for the majority of cases. So, what you
are observing is how pg_stat_statements currently works.

If an entry is popular enough, we will need it anyways ( even
with the proposed sampling ). An entry that's not popular will
eventually be aged out.

  From my understanding, what the proposed sampling will do is
to reduce the overhead  of incrementing counters of popular entries,
because of the spinlock to update the counters. This is particularly
the case with high concurrency on large machines ( high cpu count ),
and especially when there is a small set of popular entries.
IMO, This patch should also have a benchmark that proves
that a user can benefit with sampling in those types of
workloads.



Ah, so patch version 9 might be the best fit to achieve this. I’ll need 
to benchmark it on a large, high-concurrency machine then.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

2025-02-13 Thread Ilia Evdokimov



On 12.02.2025 22:56, Robert Haas wrote:

On Wed, Feb 12, 2025 at 2:55 PM Andrei Lepikhov  wrote:

On 13/2/2025 01:40, Tom Lane wrote:

I was idly speculating yesterday about letting the Ryu code print
the division result, so that we get a variable number of digits.
Realistically, that'd probably result in many cases in more digits
than anybody wants, so it's not a serious proposal.  I'm cool with
the fixed-two-digits approach to start with.

Okay, since no one else voted for the meaningful-numbers approach, I
would say that fixed size is better than nothing. It may cover some of
my practical cases, but unfortunately, not the most problematic ones.

I don't love it either, but I do think it is significantly better than nothing.




I'm in favor of having some improvement rather than nothing at 
all—otherwise, we might never reach a consensus.


1. Documentation 
(v9-0001-Clarify-display-of-rows-as-decimal-fractions-DOC.patch)


One thing that bothers me is that the documentation explains how to 
compute total time, but it does not clarify how to compute total rows. 
Maybe this was obvious to others before, but now that we are displaying 
|rows| as a fraction, we should explicitly document how to interpret it 
alongside total time.


I believe it would be helpful to show a non-integer rows value in an 
example query. However, to achieve this, we need the index scan results 
to vary across iterations. One way to accomplish this is by using the 
condition t1.unique2 > t2.unique2. Additionally, I suggest making loops 
a round number (e.g., 100) for better readability, which can be achieved 
using t1.thousand < 10. The final query would look like this:


EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.thousand < 10 AND t1.unique2 > t2.unique2;

I believe this is an ideal example for the documentation because it not 
only demonstrates fractional rows, but also keeps the execution plan 
nearly unchanged. While the estimated and actual average row counts 
become slightly rounded, I don't see another way to ensure different 
results for each index scan.


I'm open to any feedback or suggestions for a better example to use in 
the documentation or additional explaining fractional rows in the text.


2. Code and tests 
(v9-0002-Clarify-display-of-rows-as-decimal-fractions.patch)


I left the code and tests unchanged since we agreed on a fixed format of 
two decimal places.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 789d98383988ef6d3b0bc2c6b9b5c73a83ffd6d4 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 13 Feb 2025 11:19:03 +0300
Subject: [PATCH v9] Clarify display of rows as decimal fractions

When loops > 1, the average rows value is now displayed as a decimal fraction
with two digits after the decimal point in EXPLAIN ANALYZE.

Previously, the average rows value was always rounded to an integer,
which could lead to misleading results when estimating total rows
by multiplying rows by loop. This change ensures that users
get a more accurate representation of the data flow through execution nodes.
---
 doc/src/sgml/perform.sgml | 41 ---
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..dd61ae4507 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -717,26 +717,26 @@ FROM tenk1 t1 WHERE t1.ten = (SELECT (random() * 10)::integer);
 
 EXPLAIN ANALYZE SELECT *
 FROM tenk1 t1, tenk2 t2
-WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
-
-   QUERY PLAN
&zwsp;--
- Nested Loop  (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1)
-   Buffers: shared hit=36 read=6
-   ->  Bitmap Heap Scan on tenk1 t1  (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1)
- Recheck Cond: (unique1 < 10)
- Heap Blocks: exact=10
- Buffers: shared hit=3 read=5 written=4
- ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10 loops=1)
-   Index Cond: (unique1 < 10)
+WHERE t1.thousand < 10 AND t1.unique2 > t2.unique2;
+
+ QUERY PLAN
+---&zwsp;-
+ Nested Loop  (cost=5.40..11571.44 rows=356667 width=488) (actual time=0.042..117.205 rows=513832 loops=1)
+   Buffers: shared hit=19377 read=29
+   ->  Bitmap Heap Scan on tenk1 t1  (cost=5.11..233.60 rows=107 width=244) (actual time=0.021..0.103 rows=100 loops=1)
+ Recheck Cond: (thousand < 10)
+ Heap

Re: explain analyze rows=%.0f

2025-02-11 Thread Ilia Evdokimov



On 11.02.2025 20:41, Robert Haas wrote:

On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov  wrote:

I support the idea in general, but I believe it should be expanded to
cover all cases of parameterised plan nodes. Each rescan iteration may
produce a different number of tuples, and rounding can obscure important
data.

For example, consider five loops of a scan node: the first loop returns
nine tuples, and each other - zero tuples. When we calculate the
average, 9 divided by 5 equals 1.8. This results in an explanation that
indicates "rows = 1," masking almost 40% of the data.

Now, if we apply the same two loops but have a total of 900,000 tuples,
then 400,000 masked tuples represent a significant portion of the data.

Moreover, switching to a floating-point type for row explanations in
each parameterised node would provide a more comprehensive view and add
valuable information about the parameterisation of the node, which may
not be immediately apparent.

I agree strongly with all of this. I believe we should just implement
what was agreed here:

https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us

Let's just display 2 fractional digits when nloops>1, else 0, and call it good.



Thank you for your review!

With such example, it's hard to disagree with it. This would really add 
valuable information. Taking all opinions into account, I have updated 
the patch v8. I have also included a check for the case where there are 
only zeros after the decimal point. We do not want to clutter the rows 
with unnecessary zeros.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From e1a46d3fa5a4d20b6ea4728b2a3955f375539295 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 11 Feb 2025 22:12:41 +0300
Subject: [PATCH] Clarify display of rows as decimal fractions

When loops > 1, the average rows value is shown as decimal fractions
with two digits after the decimal point in EXPLAIN ANALYZE.
---
 doc/src/sgml/perform.sgml |  2 +
 src/backend/commands/explain.c| 55 +--
 src/test/regress/expected/partition_prune.out | 18 +++---
 src/test/regress/sql/partition_prune.sql  |  2 +-
 4 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index a502a2aaba..a01832040c 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -758,6 +758,8 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
 the loops value to get the total time actually spent in
 the node.  In the above example, we spent a total of 0.030 milliseconds
 executing the index scans on tenk2.
+If loops is greater than 1,
+the rows is shown as a decimal unless it's an integer.

 

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e..ae81aeea13 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/xact.h"
 #include "catalog/pg_type.h"
 #include "commands/createas.h"
@@ -69,6 +71,9 @@ typedef struct SerializeMetrics
  */
 #define BYTES_TO_KILOBYTES(b) (((b) + 1023) / 1024)
 
+/* Check if float/double has any decimal number */
+#define HAS_DECIMAL(x) (floor(x) != x)
+
 static void ExplainOneQuery(Query *query, int cursorOptions,
 			IntoClause *into, ExplainState *es,
 			ParseState *pstate, ParamListInfo params);
@@ -1981,14 +1986,15 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 		{
+			appendStringInfo(es->str, " (actual ");
+
 			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
+appendStringInfo(es->str, "time=%.3f..%.3f ", startup_ms, total_ms);
+
+			if (nloops > 1 && HAS_DECIMAL(rows))
+appendStringInfo(es->str, "rows=%.2f loops=%.0f)", rows, nloops);
 			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
+appendStringInfo(es->str, "rows=%.0f loops=%.0f)", rows, nloops);
 		}
 		else
 		{
@@ -1999,8 +2005,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
 	 3, es);
 			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			if (nloops > 1 && HAS_DECIMAL(rows))
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 2, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+			else
+			{
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Ac

Re: Add missing tab completion for VACUUM and ANALYZE with ONLY option

2025-03-19 Thread Ilia Evdokimov



On 19.03.2025 12:55, vignesh C wrote:

Looks good overall. However, after pressing Tab, ONLY appears in the completion 
list, but it doesn't autocomplete after typing the first letter.

In which case you noticed this?



Ah, I just tested it using these examples from the regression tests:

CREATE TABLE only_parted (a int, b text) PARTITION BY LIST (a);
CREATE TABLE only_parted1 PARTITION OF only_parted FOR VALUES IN (1);

ONLY doesn't autocomplete here because the table names begin with 
"only...", which conflicts with autocompletion—this was my oversight.


--

Everything seems fine with the ANALYZE command autocompletion. Regarding 
VACUUM, I'm not entirely convinced we should provide autocompletion in 
every case. I'd prefer to keep the behavior from the v3 patch because 
the original intention behind adding the ONLY keyword was specifically 
for collecting statistics exclusively from parent tables. Therefore, 
autocompletion makes sense primarily for the ANALYZE case.


Any thoughts?

--
Best regaards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-03-22 Thread Ilia Evdokimov

After considering the opinions expressed in this discussion, I tend to 
agree more with David. If we add info about estimated unique keys and 
estimated capacity, then any additional information - such as 
evict_ratio and hit_ratio - can also be calculated, as EXPLAIN ANALYZE 
provides all the necessary details to compute these values.


For now, I’m attaching a rebased v4 patch, which includes the fix for 
ExplainPropertyText.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From b60e508583b1cb7f30814cef6f39c4b570693350 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Fri, 21 Mar 2025 11:52:29 +0300
Subject: [PATCH v4] Show ndistinct and est_entries in EXPLAIN for Memoize

---
 src/backend/commands/explain.c  | 16 
 src/backend/optimizer/path/costsize.c   |  3 +++
 src/backend/optimizer/plan/createplan.c | 10 +++---
 src/backend/optimizer/util/pathnode.c   |  3 +++
 src/include/nodes/pathnodes.h   |  2 ++
 src/include/nodes/plannodes.h   |  6 ++
 6 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 391b34a2af2..cb7e295ca8a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3628,6 +3628,22 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 	ExplainPropertyText("Cache Key", keystr.data, es);
 	ExplainPropertyText("Cache Mode", mstate->binary_mode ? "binary" : "logical", es);
 
+	if (es->costs)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Estimated Capacity: %u  Estimated Distinct Lookup Keys: %0.0f\n",
+			((Memoize *) plan)->est_entries,
+			((Memoize *) plan)->est_unique_keys);
+		}
+		else
+		{
+			ExplainPropertyUInteger("Estimated Capacity", "", ((Memoize *) plan)->est_entries, es);
+			ExplainPropertyFloat("Estimated Distinct Lookup Keys", "", ((Memoize *) plan)->est_unique_keys, 0, es);
+		}
+	}
+
 	pfree(keystr.data);
 
 	if (!es->analyze)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index f6f77b8fe19..e3abf9ccc26 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2604,6 +2604,9 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
 	mpath->est_entries = Min(Min(ndistinct, est_cache_entries),
 			 PG_UINT32_MAX);
 
+	/* Remember ndistinct for a potential EXPLAIN later */
+	mpath->est_unique_keys = ndistinct;
+
 	/*
 	 * When the number of distinct parameter values is above the amount we can
 	 * store in the cache, then we'll have to evict some entries from the
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 75e2b0b9036..596b82c5dc9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -284,7 +284,8 @@ static Material *make_material(Plan *lefttree);
 static Memoize *make_memoize(Plan *lefttree, Oid *hashoperators,
 			 Oid *collations, List *param_exprs,
 			 bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids);
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys);
 static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
  int partNumCols, AttrNumber *partColIdx, Oid *partOperators, Oid *partCollations,
  int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations,
@@ -1703,7 +1704,8 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
 
 	plan = make_memoize(subplan, operators, collations, param_exprs,
 		best_path->singlerow, best_path->binary_mode,
-		best_path->est_entries, keyparamids);
+		best_path->est_entries, keyparamids,
+		best_path->est_unique_keys);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -6636,7 +6638,8 @@ materialize_finished_plan(Plan *subplan)
 static Memoize *
 make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 			 List *param_exprs, bool singlerow, bool binary_mode,
-			 uint32 est_entries, Bitmapset *keyparamids)
+			uint32 est_entries, Bitmapset *keyparamids,
+			double est_unique_keys)
 {
 	Memoize*node = makeNode(Memoize);
 	Plan	   *plan = &node->plan;
@@ -6654,6 +6657,7 @@ make_memoize(Plan *lefttree, Oid *hashoperators, Oid *collations,
 	node->binary_mode = binary_mode;
 	node->est_entries = est_entries;
 	node->keyparamids = keyparamids;
+	node->est_unique_keys = est_unique_keys;
 
 	return node;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 93e73cb44db..1fbcda99067 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1701,6 +1701,9 @@ cre

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-03-20 Thread Ilia Evdokimov



On 20.03.2025 13:37, David Rowley wrote:

est_entries is a uint32, so %u is the correct format character for that type.

I don't think you need to prefix all these properties with "Cache"
just because the other two properties have that prefix. I also don't
think the names you've chosen really reflect the meaning. How about
something like: "Estimated Distinct Lookup Keys: 721  Estimated
Capacity: 655", in that order. I think maybe having that as one line
for format=text is better than 2 lines. The EXPLAIN output is already
often taking up more lines in v18 than in v17, would be good to not
make that even worse unnecessarily.



LGTM


I see the existing code there could use ExplainPropertyText rather
than have a special case for text and non-text formats. That's likely
my fault. If we're touching this code, then we should probably tidy
that up. Do you want to create a precursor fixup patch for that?



I fully agree with this suggestion. Then I'll begin with this on another 
new thread.



+ double ndistinct; /* Estimated number of distinct memoization keys,
+ * used for cache size evaluation. Kept for EXPLAIN */

Maybe this field in MemoizePath needs a better name. How about
"est_unique_keys"? and also do the rename in struct Memoize.



LGTM


I'm also slightly concerned about making struct Memoize bigger. I had
issues with a performance regression [1] for 908a96861 when increasing
the WindowAgg struct size last year and the only way I found to make
it go away was to shuffle the fields around so that the struct size
didn't increase. I think we'll need to see a benchmark of a query that
hits Memoize quite hard with a small cache size to see if the
performance decreases as a result of adding the ndistinct field. It's
unfortunate that we'll not have the luxury of squeezing this double
into padding if we do see a slowdown.



I tried rearranging the fields in the structure with 'ndistinct' field, 
but unfortunately, sizeof(Memoize) did not remain the same. Therefore, 
benchmarking Memoize is definitely necessary. Thanks for the information!


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Proposal - Allow extensions to set a Plan Identifier

2025-03-18 Thread Ilia Evdokimov




On 18.03.2025 14:46, Ilia Evdokimov wrote:


However, I have a question: what value should planid have when we 
execute the standard planner without using a planner_hook? For 
example, if pg_stat_statementsis disabled, would planid default to 
zero? If zero is the expected default, should we explicitly write this 
behavior?


result->planid = 0;



Sorry for spam. I found the answer this question in thread [0].

[0]: 
https://www.postgresql.org/message-id/flat/CAP53Pkyow59ajFMHGpmb1BK9WHDypaWtUsS_5DoYUEfsa_Hktg%40mail.gmail.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Add missing tab completion for VACUUM and ANALYZE with ONLY option

2025-03-18 Thread Ilia Evdokimov




On 18.03.2025 17:57, Ilia Evdokimov wrote:

Hi,

Thank you for the patch. However, Is it ok if I can't tab 'ONLY' in 
VACUUM ANALYZE command?


CREATE TABLE only_parted (a int, b text) PARTITION BY LIST (a);
CREATE TABLE only_parted1 PARTITION OF only_parted FOR VALUES IN (1);
INSERT INTO only_parted VALUES (1, 'a');

VACUUM ANALYZE {Press Tab}

information_schema.  only_parted  only_parted1 public.

- 



Although it works:

postgres=# VACUUM ANALYZE ONLY only_parted;
WARNING:  VACUUM ONLY of partitioned table "only_parted" has no effect
VACUUM
postgres=#

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Umar, Vignesh, sorry for not adding you to CC earlier!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause

2025-03-18 Thread Ilia Evdokimov


Hi,

On commit 122a9af I can't see any problem with query:

explain analyze select * from t1 left join (select a, max(b) from t2 
group by a) t2 on t1.a = t2.a;

 QUERY PLAN
-
 Hash Right Join  (cost=397.65..669.04 rows=1000 width=12) (actual 
time=9.008..11.549 rows=1000.00 loops=1)

   Hash Cond: (t2.a = t1.a)
   Buffers: shared read=90
   ->  HashAggregate  (cost=370.15..560.25 rows=19010 width=8) (actual 
time=8.103..10.160 rows=19010.00 loops=1)

 Group Key: t2.a
 Batches: 1  Memory Usage: 2321kB
 Buffers: shared read=85
 ->  Seq Scan on t2  (cost=0.00..275.10 rows=19010 width=8) 
(actual time=0.015..1.907 rows=19010.00 loops=1)

   Buffers: shared read=85
   ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual 
time=0.365..0.366 rows=1000.00 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 44kB
 Buffers: shared read=5
 ->  Seq Scan on t1  (cost=0.00..15.00 rows=1000 width=4) 
(actual time=0.033..0.187 rows=1000.00 loops=1)

   Buffers: shared read=5
 Planning:
   Buffers: shared hit=31 read=16
 Planning Time: 0.751 ms
 Execution Time: 12.599 ms
(18 rows)

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Add missing tab completion for VACUUM and ANALYZE with ONLY option

2025-03-18 Thread Ilia Evdokimov




On 18.03.2025 15:25, vignesh C wrote:

On Thu, 27 Feb 2025 at 11:42, vignesh C  wrote:

On Thu, 6 Feb 2025 at 16:29, Umar Hayat  wrote:

This will include "ONLY" also when we display the tables too:
+   else if (HeadMatches("ANALYZE"))
+
COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_analyzables,
"ONLY");

like below:
postgres=# analyze only t1
information_schema.  ONLY public.  t1

The same issue exists with vacuum too:
postgres=# vacuum only t1
information_schema.  ONLY public.  t1

The attached patch has the fixes for the above issue.

Regards,
Vignesh



Hi,

Thank you for the patch. However, Is it ok if I can't tab 'ONLY' in 
VACUUM ANALYZE command?


CREATE TABLE only_parted (a int, b text) PARTITION BY LIST (a);
CREATE TABLE only_parted1 PARTITION OF only_parted FOR VALUES IN (1);
INSERT INTO only_parted VALUES (1, 'a');

VACUUM ANALYZE {Press Tab}

information_schema.  only_parted  only_parted1 public.

-

Although it works:

postgres=# VACUUM ANALYZE ONLY only_parted;
WARNING:  VACUUM ONLY of partitioned table "only_parted" has no effect
VACUUM
postgres=#

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: [PATCH] Optionally record Plan IDs to track plan changes for a query

2025-03-18 Thread Ilia Evdokimov




On 12.02.2025 19:00, Alvaro Herrera wrote:

On 2025-Feb-12, Julien Rouhaud wrote:


On Wed, Feb 12, 2025 at 01:57:47PM +0100, Alvaro Herrera wrote:

Anyway, I think that's different.  We do support compute_query_id=off as
a way for a custom module to compute completely different query IDs
using their own algorithm, which I think is what you're referring to.
However, the ability to affect the way the in-core algorithm works is a
different thing: you still want in-core code to compute the query ID.

I don't think that's the actual behavior, or at least not what it was
supposed to be.

What we should have is the ability to compute queryid, which can be
either in core or done by an external module, but one only one can /
should be done.

Yes, that's what I tried to say, but I don't understand why you say I
said something different.


Right now, the proposal in the other thread is that if you want to
affect that algorithm in order to merge arrays to be considered a single
query element regardless of its length, you set the GUC for that.
Initially the GUC was in the core code.  Then, based on review, the GUC
was moved to the extension, _BUT_ the implementation was still in the
core code: in order to activate it, the extension calls a function that
modifies core code behavior.  So there are more moving parts than
before, and if you for whatever reason want that behavior but not the
extension, then you need to write a C function.  To me this is absurd.
So what I suggest we do is return to having the GUC in the core code.

I agree, although that probably breaks the queryid extensibility.

It does?


I haven't read the patch but IIUC if you want the feature to work you
need to both change the queryid calculation but also the way the
constants are recorded and the query text is normalized, and I don't
know if extensions have access to it.

Hmm.  As for the query text: with Andrey's feature with the GUC in core,
a query like this
SELECT foo FROM tab WHERE col1 IN (1,2,3,4)
will have in pg_stat_activity an identical query_id to a query like this
SELECT foo WHERE tab WHERE col1 IN (1,2,3,4,5)
even though the query texts differ (in the number of elements in the
array).  I don't think this is a problem.  This means that the query_id
for two different queries can be identical, but that should be no
surprise, precisely because the GUC that controls it is documented to do
that.

If pg_stat_statements is enabled with Andrey's patch, then the same
query_id will have a single entry (which has stats for both execution of
those queries) with that query_id, with a normalized query text that is
going to be different from those two above; without Andrey's feature,
the text would be
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4);
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4,$5);
(that is, pg_stat_statements transformed the values into placeholders,
but using exactly the same number of items in the array as the original
queries).  With Andrey's feature, it will be
SELECT foo WHERE tab WHERE col1 IN (...);
that is, the query text has been modified and no longer matches exactly
any of the queries in pg_stat_activity.  But note that the query text
already does not match what's in pg_stat_activity, even before Andrey's
patch.

I don't understand what you mean with "the way the constants are
recorded".  What constants are you talking about?  pg_stat_statements
purposefully discards any constants used in the query (obviously).


If they have access and fail to do what the GUC asked then of course
that's just a bug in that extension.

I don't understand what bug are you thinking that such hypothetical
extension would have.  (pg_stat_statements does of course have access to
the query text and to the location of all constants).


Now I admit I'm not sure what the solution would be for the problem
discussed in this subthread.  Apparently the problem is related to temp
tables and their changing OIDs.  I'm not sure what exactly the proposal
for a GUC is.

I'm not proposing anything, just explaining why pg_stat_statements is
generally useless if you use temp tables as someone asked.

Ah, okay.  Well, where you see a deficiency, I see an opportunity for
improvement :-)



Hi everyone,

I support the idea of computing the planid  for temporary tables using 
'pg_temp.rel_name'. Moreover, we have already started using this 
approach for computing queryid [0]. It seems reasonable to apply the 
same logic to the planid calculation as well.


[0]: 
https://www.postgresql.org/message-id/flat/Z9mkqplmUpQ4xG52%40msg.df7cb.de#efb20f01bec32aeafd58e5d4ab0dfc16


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

2025-03-20 Thread Ilia Evdokimov



On 06.07.2023 11:27, Lukas Fittl wrote:

On Thu, Jul 6, 2023 at 12:56 AM Daniel Gustafsson  wrote:

Lukas: do you have an updated patch for this commitfest to address
David's
comments?


I have a draft - I should be able to post an updated patch in the next 
days. Thanks for checking!


Thanks,
Lukas

--
Lukas Fittl



Hi hackers,

While debugging a query execution plan involving Memoize, it'd be nice 
to determine how many unique keys would fit into the cache. The 
est_entries value provides some insight, but without knowing ndistinct, 
it is unclear whether the cache is large enough to hold all unique keys 
or if some will be evicted.


Given its potential usefulness, I would like to work for this. I 
attached v2 patch with changes.


Example from memoize.sql

EXPLAIN SELECT COUNT(*),AVG(t1.unique1) FROM tenk1 t1
INNER JOIN tenk1 t2 ON t1.unique1 = t2.thousand
WHERE t2.unique1 < 1200;
 QUERY PLAN
-
 Aggregate  (cost=815.12..815.13 rows=1 width=40)
   ->  Nested Loop  (cost=0.30..809.12 rows=1200 width=4)
 ->  Seq Scan on tenk1 t2  (cost=0.00..470.00 rows=1200 width=4)
   Filter: (unique1 < 1200)
 ->  Memoize  (cost=0.30..0.41 rows=1 width=4)
   Cache Key: t2.thousand
   Cache Mode: logical
   Cache Estimated Entries: 655
   Cache Estimated NDistinct: 721
   ->  Index Only Scan using tenk1_unique1 on tenk1 t1  
(cost=0.29..0.40 rows=1 width=4)

 Index Cond: (unique1 = t2.thousand)
(11 rows)

Additionally, since this information would only be shown in EXPLAIN when 
costs are enabled, it should not cause any performance regression in 
normal execution. However, reviewers should be especially careful when 
verifying test outputs, as this change could affect plan details in 
regression tests.


Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 2e7d9bd9bfe58825fbdbb0100b9bd19d5bb7e53b Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 20 Mar 2025 11:45:02 +0300
Subject: [PATCH v2] Show ndistinct and est_entries in EXPLAIN for Memoize

---
 src/backend/commands/explain.c  | 12 
 src/backend/optimizer/path/costsize.c   |  3 +++
 src/backend/optimizer/plan/createplan.c | 10 +++---
 src/backend/optimizer/util/pathnode.c   |  3 +++
 src/include/nodes/pathnodes.h   |  2 ++
 src/include/nodes/plannodes.h   |  6 ++
 6 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 33a16d2d8e2..95110e63c7d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3629,6 +3629,11 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 	{
 		ExplainPropertyText("Cache Key", keystr.data, es);
 		ExplainPropertyText("Cache Mode", mstate->binary_mode ? "binary" : "logical", es);
+		if (es->costs)
+		{
+			ExplainPropertyFloat("Cache Estimated Entries", "", ((Memoize *) plan)->est_entries, 0, es);
+			ExplainPropertyFloat("Cache Estimated NDistinct", "", ((Memoize *) plan)->ndistinct, 0, es);
+		}
 	}
 	else
 	{
@@ -3636,6 +3641,13 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 		appendStringInfo(es->str, "Cache Key: %s\n", keystr.data);
 		ExplainIndentText(es);
 		appendStringInfo(es->str, "Cache Mode: %s\n", mstate->binary_mode ? "binary" : "logical");
+		if (es->costs)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Cache Estimated Entries: %d\n", ((Memoize *) plan)->est_entries);
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Cache Estimated NDistinct: %0.0f\n", ((Memoize *) plan)->ndistinct);
+		}
 	}
 
 	pfree(keystr.data);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 256568d05a2..9c0738da4fe 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2604,6 +2604,9 @@ cost_memoize_rescan(PlannerInfo *root, MemoizePath *mpath,
 	mpath->est_entries = Min(Min(ndistinct, est_cache_entries),
 			 PG_UINT32_MAX);
 
+	/* Remember ndistinct for a potential EXPLAIN later */
+	mpath->ndistinct = ndistinct;
+
 	/*
 	 * When the number of distinct parameter values is above the amount we can
 	 * store in the cache, then we'll have to evict some entries from the
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 75e2b0b9036..79795a2b5f2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -284,7 +284,8 @@ static Materia

1 2 >

1 - 100 of 151 matches

Mail list logo