Re: pg_stat_advisor extension

Ilia Evdokimov Mon, 13 May 2024 05:26:23 -0700

1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore

get_parallel_divisor().

2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or

this error may be raised for only one specific set of constants, andestimation will be done well in another 99.9999% of cases for the sameexpression.

The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggestthe execution of the ANALYZE command on specific tables. The extensionnow evaluates the ratio of `n_live_tup` (number of live tuples) to`n_mod_since_analyze` (number of modifications since last analyze) inthe `pg_stat_all_tables` catalog. If this ratio exceeds the valuespecified in `analyze_scale_factor`, the extension will suggest anupdate to the table's statistics.

There are a lot of parameters that influences on estimated rows.Statistics might not help improve estimated rows. This feature isdesigned to provide users with data-driven insights to decide whetherupdating statistics via the ANALYZE command could potentially improvequery performance. By suggesting rather than automatically executingstatistics updates, we empower you to make informed decisions based onthe specific needs and conditions of your database environment.

I've developed an extension that provides suggestions on whether toupdate or create statistics for your PostgreSQL database, withoutexecuting any changes. This approach allows you to consider variousparameters that influence row estimates and make informed decisionsabout optimizing your database's performance.

Your feedback is invaluable, and we look forward to hearing about yourexperiences and any improvements you might suggest. Best regards, IliaEvdokimov Tantor Labs LLC.

From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.ru>
Date: Mon, 13 May 2024 14:41:59 +0300
Subject: [PATCH] 'pg_stat_advisor' extension

This service as a hook into executor. It has two GUC-parameters.

pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command.

pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns'
---
 contrib/Makefile                              |   1 +
 contrib/pg_stat_advisor/Makefile              |  20 +
 .../expected/pg_stat_advisor.out              |  52 ++
 contrib/pg_stat_advisor/meson.build           |  30 +
 contrib/pg_stat_advisor/pg_stat_advisor.c     | 560 ++++++++++++++++++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  24 +
 6 files changed, 687 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..d6ce2fe562 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 0000000000..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
new file mode 100644
index 0000000000..8f3dab2c2f
--- /dev/null
+++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
@@ -0,0 +1,52 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+     i/100 as fld_1,
+     i/500 as fld_2
+FROM generate_series(1, 10000000) s(i);
+ANALYZE my_tbl;
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+     i/100 as fld_1,
+     i/500 as fld_2
+FROM generate_series(1, 10000000) s(i);
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
+NOTICE:  pg_stat_advisor suggestion: 'ANALYZE my_tbl'
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Gather (actual rows=200 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on my_tbl (actual rows=67 loops=3)
+         Filter: ((fld_1 = 1) AND (fld_2 = 0))
+         Rows Removed by Filter: 6666600
+(6 rows)
+
+ANALYZE my_tbl;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
+NOTICE:  pg_stat_advisor suggestion: CREATE STATISTICS my_tbl_fld_1_fld_2 ON fld_1, fld_2 FROM my_tbl
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Gather (actual rows=200 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on my_tbl (actual rows=67 loops=3)
+         Filter: ((fld_1 = 1) AND (fld_2 = 0))
+         Rows Removed by Filter: 6666600
+(6 rows)
+
+CREATE STATISTICS my_tbl_fld_1_fld_2 ON fld_1, fld_2 FROM my_tbl;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Gather (actual rows=200 loops=1)
+   Workers Planned: 2
+   Workers Launched: 2
+   ->  Parallel Seq Scan on my_tbl (actual rows=67 loops=3)
+         Filter: ((fld_1 = 1) AND (fld_2 = 0))
+         Rows Removed by Filter: 6666600
+(6 rows)
+
diff --git a/contrib/pg_stat_advisor/meson.build b/contrib/pg_stat_advisor/meson.build
new file mode 100644
index 0000000000..292917c025
--- /dev/null
+++ b/contrib/pg_stat_advisor/meson.build
@@ -0,0 +1,30 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pg_stat_advisor_sources = files(
+  'pg_stat_advisor.c',
+)
+
+if host_system == 'windows'
+  pg_stat_advisor_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pg_stat_advsior',
+    '--FILEDESC', 'pg_stat_advisor - suggestion of creating statistics',])
+endif
+
+pg_stat_advisor = shared_module('pg_stat_advisor',
+  pg_stat_advisor_sources,
+  kwargs: contrib_mod_args + {
+    'dependencies': contrib_mod_args['dependencies'],
+  },
+)
+contrib_targets += pg_stat_advisor
+
+tests += {
+  'name': 'pg_stat_advisor',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'pg_stat_advisor',
+    ],
+  },
+}
diff --git a/contrib/pg_stat_advisor/pg_stat_advisor.c b/contrib/pg_stat_advisor/pg_stat_advisor.c
new file mode 100644
index 0000000000..ea391ec886
--- /dev/null
+++ b/contrib/pg_stat_advisor/pg_stat_advisor.c
@@ -0,0 +1,560 @@
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/table.h"
+#include "catalog/pg_statistic_ext.h"
+#include "commands/explain.h"
+#include "executor/spi.h"
+#include "optimizer/optimizer.h"
+#include "parser/parsetree.h"
+#include "statistics/statistics.h"
+#include "utils/guc.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+
+PG_MODULE_MAGIC;
+
+/* GUC variables */
+static double pg_stat_advisor_suggest_statistics_threshold = 1.0;
+static double pg_stat_advisor_analyze_scale_factor = 0.0;
+
+/* Current nesting depth of ExecutorRun calls */
+static int	nesting_level = 0;
+
+/* Is the current top-level query to be sampled? */
+#define pg_stat_advisor_enabled() (nesting_level == 0)
+
+void		_PG_init(void);
+
+/* Saved hook values in case of unload */
+static ExecutorStart_hook_type prev_ExecutorStart = NULL;
+static ExecutorRun_hook_type prev_ExecutorRun = NULL;
+static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
+static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
+
+static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void explain_ExecutorRun(QueryDesc *queryDesc,
+								ScanDirection direction,
+								uint64 count, bool execute_once);
+static void explain_ExecutorFinish(QueryDesc *queryDesc);
+static void explain_ExecutorEnd(QueryDesc *queryDesc);
+
+static void SuggestMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es);
+bool UpdateStatistics(char *rel_name);
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+	/* Define custom GUC variables. */
+	DefineCustomRealVariable("pg_stat_advisor.suggest_statistics_threshold",
+							 "Set the threshold for estimated/actual rows",
+							 "Zero disables suggestion of creating statistics",
+							 &pg_stat_advisor_suggest_statistics_threshold,
+							 1.0,
+							 0.0,
+							 1.0,
+							 PGC_SUSET,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	DefineCustomRealVariable("pg_stat_advisor.analyze_scale_factor",
+							 "Set the threshold for estimated/actual rows",
+							 "Zero disables suggestion of updatinging statistics",
+							 &pg_stat_advisor_analyze_scale_factor,
+							 0.0,
+							 0.0,
+							 1.0,
+							 PGC_SUSET,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	MarkGUCPrefixReserved("pg_stat_advisor");
+
+	/* Install hooks. */
+	prev_ExecutorStart = ExecutorStart_hook;
+	ExecutorStart_hook = explain_ExecutorStart;
+	prev_ExecutorRun = ExecutorRun_hook;
+	ExecutorRun_hook = explain_ExecutorRun;
+	prev_ExecutorFinish = ExecutorFinish_hook;
+	ExecutorFinish_hook = explain_ExecutorFinish;
+	prev_ExecutorEnd = ExecutorEnd_hook;
+	ExecutorEnd_hook = explain_ExecutorEnd;
+}
+
+
+/*
+ * ExecutorStart hook: start up logging if needed
+ */
+static void
+explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
+{
+	if (prev_ExecutorStart)
+		prev_ExecutorStart(queryDesc, eflags);
+	else
+		standard_ExecutorStart(queryDesc, eflags);
+
+	/*
+	* Set up to track total elapsed time in ExecutorRun.  Make sure the
+	* space is allocated in the per-query context so it will go away at
+	* ExecutorEnd.
+	*/
+	if (pg_stat_advisor_enabled() && queryDesc->totaltime == NULL)
+	{
+		MemoryContext oldcxt;
+
+		oldcxt = MemoryContextSwitchTo(queryDesc->estate->es_query_cxt);
+		queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL, false);
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
+/*
+ * ExecutorRun hook: all we need do is track nesting depth
+ */
+static void
+explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction,
+					uint64 count, bool execute_once)
+{
+	nesting_level++;
+	PG_TRY();
+	{
+		if (prev_ExecutorRun)
+			prev_ExecutorRun(queryDesc, direction, count, execute_once);
+		else
+			standard_ExecutorRun(queryDesc, direction, count, execute_once);
+	}
+	PG_FINALLY();
+	{
+		nesting_level--;
+	}
+	PG_END_TRY();
+}
+
+/*
+ * ExecutorFinish hook: all we need do is track nesting depth
+ */
+static void
+explain_ExecutorFinish(QueryDesc *queryDesc)
+{
+	nesting_level++;
+	PG_TRY();
+	{
+		if (prev_ExecutorFinish)
+			prev_ExecutorFinish(queryDesc);
+		else
+			standard_ExecutorFinish(queryDesc);
+	}
+	PG_FINALLY();
+	{
+		nesting_level--;
+	}
+	PG_END_TRY();
+}
+
+/*
+ * ExecutorEnd hook: log results if needed
+ */
+static void
+explain_ExecutorEnd(QueryDesc *queryDesc)
+{
+	if (queryDesc->totaltime && pg_stat_advisor_enabled())
+	{
+		MemoryContext oldcxt;
+		ExplainState *es;
+
+		/*
+		 * Make sure we operate in the per-query context, so any cruft will be
+		 * discarded later during ExecutorEnd.
+		 */
+		oldcxt = MemoryContextSwitchTo(queryDesc->estate->es_query_cxt);
+
+		/*
+		 * Make sure stats accumulation is done.  (Note: it's okay if several
+		 * levels of hook all do this.)
+		 */
+		InstrEndLoop(queryDesc->totaltime);
+
+		es = NewExplainState();
+
+		es->analyze = queryDesc->instrument_options;
+
+		ExplainBeginOutput(es);
+		ExplainQueryText(es, queryDesc);
+		ExplainPrintPlan(es, queryDesc);
+		if (es->analyze)
+			ExplainPrintTriggers(es, queryDesc);
+		if (es->costs)
+			ExplainPrintJITSummary(es, queryDesc);
+		ExplainEndOutput(es);
+
+		/* Entry to check */
+		if (!IsParallelWorker())
+			SuggestMultiColumnStatisticsForNode(queryDesc->planstate, es);
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+
+	if (prev_ExecutorEnd)
+		prev_ExecutorEnd(queryDesc);
+	else
+		standard_ExecutorEnd(queryDesc);
+}
+
+/**
+ * Comparator used to sort Vars by name
+ */
+static int
+vars_list_comparator(const ListCell *a, const ListCell *b)
+{
+	char	   *va = strVal(linitial(((ColumnRef *) lfirst(a))->fields));
+	char	   *vb = strVal(linitial(((ColumnRef *) lfirst(b))->fields));
+
+	return strcmp(va, vb);
+}
+
+/**
+ * Suggest statistics for specified subplans.
+ */
+static void
+SuggestMultiColumnStatisticsForSubPlans(List *plans, ExplainState *es)
+{
+	ListCell   *lst;
+
+	foreach(lst, plans)
+	{
+		SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+		SuggestMultiColumnStatisticsForNode(sps->planstate, es);
+	}
+}
+
+/**
+ * Suggest statistics for plan subnodes.
+ */
+static void
+SuggestMultiColumnStatisticsForMemberNodes(PlanState **planstates, int nsubnodes,
+										   ExplainState *es)
+{
+	int			j;
+
+	for (j = 0; j < nsubnodes; j++)
+		SuggestMultiColumnStatisticsForNode(planstates[j], es);
+}
+
+/*
+ * UpdateStatistics --- check entries in pg_statistic of rel_name
+ */
+bool
+UpdateStatistics(char *rel_name)
+{
+    int ret, ntuples;
+    bool result = false;
+    char sqlcmd[256];
+
+    if ((ret = SPI_connect()) < 0)
+    {
+        elog(ERROR, "SPI_connect failed: error code %d", ret);
+    }
+
+    snprintf(sqlcmd, sizeof(sqlcmd), "SELECT n_live_tup, n_mod_since_analyze FROM pg_stat_all_tables WHERE relname = '%s';", rel_name);
+
+    ret = SPI_execute(sqlcmd, true, 0);
+    if (ret != SPI_OK_SELECT)
+        elog(ERROR, "SPI_execute failed: error code %d", ret);
+
+    ntuples = SPI_processed;
+
+    if (ntuples > 0)
+    {
+        SPITupleTable *tuptable = SPI_tuptable;
+        TupleDesc tupdesc = SPI_tuptable->tupdesc;
+        HeapTuple tuple = tuptable->vals[0];
+        bool is_null;
+
+		double n_live_tup          = (double)DatumGetInt64(SPI_getbinval(tuple, tupdesc, 1, &is_null));
+        double n_mod_since_analyze = (double)DatumGetInt64(SPI_getbinval(tuple, tupdesc, 2, &is_null));
+
+        /* Criteria of suggestion's message about ANALYZE */
+        if (n_mod_since_analyze / n_live_tup > pg_stat_advisor_analyze_scale_factor)
+            result = true;
+    }
+
+    SPI_finish();
+
+    return result;
+}
+
+/**
+ * Find extended statistics from 'relation_oid' relation on colmap
+ */
+static bool
+FindExtendedStatisticsOnVars(Oid *relation_oid, Bitmapset *colmap)
+{
+	bool		isExistExtendedStatistics = false;
+	ListCell   *oid;
+
+	/* Receive all extended statistics for current relation */
+	Relation	pg_stextdata = table_open(*relation_oid, RowExclusiveLock);
+	List	   *statistics = RelationGetStatExtList(pg_stextdata);
+
+	foreach(oid, statistics)
+	{
+		Oid			statOid = lfirst_oid(oid);
+		Form_pg_statistic_ext staForm;
+		HeapTuple	htup;
+		Bitmapset  *keys = NULL;
+		int			i;
+
+		/* Read from pg_statistic_ext */
+		htup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(statOid));
+		if (!HeapTupleIsValid(htup))
+			elog(ERROR, "cache lookup failed for statistics object %u", statOid);
+		staForm = (Form_pg_statistic_ext) GETSTRUCT(htup);
+
+		for (i = 0; i < staForm->stxkeys.dim1; i++)
+			keys = bms_add_member(keys, staForm->stxkeys.values[i]);
+
+		/* If we have statistics for current columns */
+		if (!bms_compare(keys, colmap))
+		{
+			isExistExtendedStatistics = true;
+
+			elog(DEBUG1, "pg_stat_advisor: found statistics for '%s' relation - '%s'",
+				get_rel_name(*relation_oid),
+				get_rel_name(statOid));
+
+			ReleaseSysCache(htup);
+			bms_free(keys);
+
+			break;
+		}
+
+		ReleaseSysCache(htup);
+		bms_free(keys);
+	}
+
+	list_free(statistics);
+	table_close(pg_stextdata, RowExclusiveLock);
+
+	return isExistExtendedStatistics;
+}
+
+/**
+ * Suggest statistics for qual
+ */
+static void
+SuggestMultiColumnStatisticsForQual(List *qual, ExplainState *es)
+{
+	List	   *vars = NULL;
+	ListCell   *lc;
+
+	/* Extract vars from all quals */
+	foreach(lc, qual)
+	{
+		Node	   *node = (Node *) lfirst(lc);
+
+		if (IsA(node, RestrictInfo))
+			node = (Node *) ((RestrictInfo *) node)->clause;
+		vars = list_concat(vars, pull_vars_of_level(node, 0));
+	}
+
+	/* Loop until we considered all vars */
+	while (vars != NULL)
+	{
+		ListCell   *cell;
+		List	   *cols = NULL;
+		Index		relno = 0;
+		Bitmapset  *colmap = NULL;
+		RangeTblEntry *rte;
+		char	   *rel_name;
+
+		/* Contruct list of unique vars */
+		foreach(cell, vars)
+		{
+			Node	   *node = (Node *) lfirst(cell);
+
+			if (IsA(node, Var))
+			{
+				Var		   *var = (Var *) node;
+				int			varno = IS_SPECIAL_VARNO(var->varno) ? var->varnosyn : var->varno;
+
+				if (cols == NULL || varno == relno)
+				{
+					int			varattno = IS_SPECIAL_VARNO(var->varno) ? var->varattnosyn : var->varattno;
+
+					relno = varno;
+					if (var->varattno > 0 &&
+						!bms_is_member(varattno, colmap) &&
+						varno >= 1 &&	/* not synthetic var */
+						varno <= list_length(es->rtable) &&
+						list_length(cols) < STATS_MAX_DIMENSIONS)
+					{
+						rte = rt_fetch(varno, es->rtable);
+
+						if (rte->rtekind == RTE_RELATION)
+						{
+							ColumnRef  *col = makeNode(ColumnRef);
+							char	   *colname = get_rte_attribute_name(rte, varattno);
+
+							col->fields = list_make1(makeString(colname));
+							cols = lappend(cols, col);
+							colmap = bms_add_member(colmap, varattno);
+						}
+					}
+				}
+				else
+				{
+					continue;
+				}
+			}
+			vars = foreach_delete_current(vars, cell);
+		}
+
+		rte = rt_fetch(relno, es->rtable);
+		rel_name = get_rel_name(rte->relid);
+		if (UpdateStatistics(rel_name))
+		{
+			ereport(NOTICE, (errmsg("pg_stat_advisor suggestion: 'ANALYZE %s'", rel_name),
+								errhidestmt(true)));
+
+			bms_free(colmap);
+			return;
+		}
+
+		/* To suggest statitics we need to have at least 2 columns */
+		if (list_length(cols) >= 2)
+		{
+			char	   *stat_name = rel_name;
+			char	   *create_stat_stmt = (char *) "";
+			char const *sep = "ON";
+			size_t		name_len;
+
+			/* Sort variables by name */
+			list_sort(cols, vars_list_comparator);
+
+			/*
+			 * Construct name for statistic by concatenating relation name
+			 * with all columns
+			 */
+			foreach(cell, cols)
+			{
+				char	   *col_name = strVal(linitial(((ColumnRef *) lfirst(cell))->fields));
+
+				stat_name = psprintf("%s_%s", stat_name, col_name);
+				create_stat_stmt = psprintf("%s%s %s", create_stat_stmt, sep, col_name);
+				sep = ",";
+			}
+
+			name_len = strlen(stat_name);
+			/* Truncate name if it doesn't fit in NameData */
+			if (name_len >= NAMEDATALEN)
+				stat_name = psprintf("%.*s_%08x", NAMEDATALEN - 10, stat_name, (unsigned) hash_any((uint8 *) stat_name, name_len));
+
+			if (!FindExtendedStatisticsOnVars(&rte->relid, colmap))
+			{
+				ereport(NOTICE, (errmsg("pg_stat_advisor suggestion: CREATE STATISTICS %s %s FROM %s",
+										stat_name, create_stat_stmt, rel_name),
+								 errhidestmt(true)));
+			}
+		}
+
+		bms_free(colmap);
+	}
+}
+
+/**
+ * Suggest statistics for node
+ */
+static void
+SuggestMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es)
+{
+	Plan	   *plan = planstate->plan;
+
+	/* NOTE: Criteria of suggestion's message about CREATE STATISTICS */
+	if (planstate->instrument && plan->plan_rows != 0 &&
+		plan->plan_rows / planstate->instrument->ntuples < pg_stat_advisor_suggest_statistics_threshold)
+	{
+		elog(DEBUG1, "pg_stat_advisor: suggest statistics due to following parameters \
+			 estimated=%f, actual=%f, ratio=%f: plan=%s",
+			 plan->plan_rows,
+			 planstate->instrument->ntuples,
+			 plan->plan_rows / planstate->instrument->ntuples,
+			 nodeToString(plan));
+		/* quals, sort keys, etc */
+		switch (nodeTag(plan))
+		{
+			case T_IndexScan:
+				SuggestMultiColumnStatisticsForQual(((IndexScan *) plan)->indexqualorig, es);
+				break;
+			case T_IndexOnlyScan:
+				SuggestMultiColumnStatisticsForQual(((IndexOnlyScan *) plan)->indexqual, es);
+				break;
+			case T_BitmapIndexScan:
+				SuggestMultiColumnStatisticsForQual(((BitmapIndexScan *) plan)->indexqualorig, es);
+				break;
+			case T_NestLoop:
+				SuggestMultiColumnStatisticsForQual(((NestLoop *) plan)->join.joinqual, es);
+				break;
+			case T_MergeJoin:
+				SuggestMultiColumnStatisticsForQual(((MergeJoin *) plan)->mergeclauses, es);
+				SuggestMultiColumnStatisticsForQual(((MergeJoin *) plan)->join.joinqual, es);
+				break;
+			case T_HashJoin:
+				SuggestMultiColumnStatisticsForQual(((HashJoin *) plan)->hashclauses, es);
+				SuggestMultiColumnStatisticsForQual(((HashJoin *) plan)->join.joinqual, es);
+				break;
+			default:
+				break;
+		}
+		SuggestMultiColumnStatisticsForQual(plan->qual, es);
+	}
+
+	/* initPlan-s */
+	if (planstate->initPlan)
+		SuggestMultiColumnStatisticsForSubPlans(planstate->initPlan, es);
+
+	/* lefttree */
+	if (outerPlanState(planstate))
+		SuggestMultiColumnStatisticsForNode(outerPlanState(planstate), es);
+
+	/* righttree */
+	if (innerPlanState(planstate))
+		SuggestMultiColumnStatisticsForNode(innerPlanState(planstate), es);
+
+	/* special child plans */
+	switch (nodeTag(plan))
+	{
+		case T_Append:
+			SuggestMultiColumnStatisticsForMemberNodes(((AppendState *) planstate)->appendplans,
+													   ((AppendState *) planstate)->as_nplans,
+													   es);
+			break;
+		case T_MergeAppend:
+			SuggestMultiColumnStatisticsForMemberNodes(((MergeAppendState *) planstate)->mergeplans,
+													   ((MergeAppendState *) planstate)->ms_nplans,
+													   es);
+			break;
+		case T_BitmapAnd:
+			SuggestMultiColumnStatisticsForMemberNodes(((BitmapAndState *) planstate)->bitmapplans,
+													   ((BitmapAndState *) planstate)->nplans,
+													   es);
+			break;
+		case T_BitmapOr:
+			SuggestMultiColumnStatisticsForMemberNodes(((BitmapOrState *) planstate)->bitmapplans,
+													   ((BitmapOrState *) planstate)->nplans,
+													   es);
+			break;
+		case T_SubqueryScan:
+			SuggestMultiColumnStatisticsForNode(((SubqueryScanState *) planstate)->subplan, es);
+			break;
+		default:
+			break;
+	}
+}
diff --git a/contrib/pg_stat_advisor/sql/pg_stat_advisor.sql b/contrib/pg_stat_advisor/sql/pg_stat_advisor.sql
new file mode 100644
index 0000000000..a1ac29329d
--- /dev/null
+++ b/contrib/pg_stat_advisor/sql/pg_stat_advisor.sql
@@ -0,0 +1,24 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+     i/100 as fld_1,
+     i/500 as fld_2
+FROM generate_series(1, 10000000) s(i);
+ANALYZE my_tbl;
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+     i/100 as fld_1,
+     i/500 as fld_2
+FROM generate_series(1, 10000000) s(i);
+
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
+
+ANALYZE my_tbl;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
+
+CREATE STATISTICS my_tbl_fld_1_fld_2 ON fld_1, fld_2 FROM my_tbl;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM my_tbl WHERE fld_1 = 1 AND fld_2 = 0;
\ No newline at end of file
-- 
2.34.1

Re: pg_stat_advisor extension

Reply via email to