Multitenancy optimization

Konstantin Knizhnik Thu, 28 Mar 2019 05:41:48 -0700

Hi hacker,

Postgres is quite frequently used in different Internet services withmulti-tenant architecture.It means that all object stored in the database have something like"tenant_id" foreign key.

This key is used in all queries, i.e.


     select * from Product where tenant_id=? and product_name=?;

The problem is that columns "tenant_id" and "product_name" arefrequently highly correlated (for example if this product is producedjust by one company).And Postgres knows nothing about this correlation and so makes incorrectestimation of selectivity of this predicate.

Certainly it is possible to create multicolumn statistics to notifyPostgres about columns correlation.

But unfortunately it is not good and working solution.

First of all we have to create multicolumn statistic for all possiblecombinations of table's attributes including "tenant_id".

It is very inconvenient and inefficient.

Second - right now multicolumn statistic is not used for calculatingjoin selectivity. And for joins estimation errors are most critical,

causing Postgres to choose bad execution plans.

From my point of view the best solution is to make Postgres take inaccount possible statistics errors and choose "stable" plan whichcost is not significantly increased in case of estimation errors. But itrequires huge refactoring of optimizer.

Right now I have information that some of Postgres customer which facedwith such problem just hacked calc_joinrel_size_estimate function,checking attribute name and if it is "tenant_id" then do not take itsselectivity in account.It leads to good query plans but certainly can not be considered asacceptable solution.


I thought about more straightforward ways for reaching the same effect.

Right now Postgres allows to explicitly specify number of distinctvalues for the attribute:


      alter table foo alter column x set (n_distinct=1);

Unfortunately just setting it to 1 doesn't work. Postgres calculatesselectivity based on MCV or histogram and not using n_distinct value.

It is also possible to disable collection of statistic for this columns:

      alter table foo alter column x set statistics 0;

But in this case Postgres is choosing DEFAULT_NUM_DISTINCT despite ton_distinct option specified for this attribute.I propose small patch which makes Postgres to use explicitly specifiedn_distinct attribute option value when no statistic is available.

This test illustrating how it works (without this patch estimation forthis query is 1 row):


postgres=# create table foo(x integer, y integer);
CREATE TABLE

postgres=# insert into foo values (generate_series(1,100000)/10,generate_series(1,100000)/10);

INSERT 0 100000
postgres=# alter table foo alter column x set (n_distinct=1);
ALTER TABLE
postgres=# alter table foo alter column x set statistics 0;
ALTER TABLE
postgres=# analyze foo;
ANALYZE
postgres=# explain select * from foo where x=100 and y=100;
                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on foo  (cost=0.00..1943.00 rows=10 width=8)
   Filter: ((x = 100) AND (y = 100))
(2 rows)


Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ecffffb..2645177 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -127,6 +127,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "statistics/statistics.h"
+#include "utils/attoptcache.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datum.h"
@@ -4448,9 +4449,15 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		vardata->atttype = var->vartype;
 		vardata->atttypmod = var->vartypmod;
 		vardata->isunique = has_unique_index(vardata->rel, var->varattno);
-
 		/* Try to locate some stats */
 		examine_simple_variable(root, var, vardata);
+		/* Extract n_dictint attribute option if any */
+		{
+			RangeTblEntry *rte = root->simple_rte_array[var->varno];
+			AttributeOpts *aopt = get_attribute_options(rte->relid, var->varattno);
+			if (aopt != NULL)
+				vardata->n_distinct = aopt->n_distinct;
+		}
 
 		return;
 	}
@@ -4916,6 +4923,9 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
 	if (stadistinct > 0.0)
 		return clamp_row_est(stadistinct);
 
+	if (vardata->n_distinct != 0)
+		return vardata->n_distinct;
+
 	/*
 	 * Otherwise we need to get the relation size; punt if not available.
 	 */
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 8480515..8c0f62e 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -73,6 +73,7 @@ typedef struct VariableStatData
 	Oid			vartype;		/* exposed type of expression */
 	Oid			atttype;		/* actual type (after stripping relabel) */
 	int32		atttypmod;		/* actual typmod (after stripping relabel) */
+	float8		n_distinct;     /* number of distinct values */
 	bool		isunique;		/* matches unique index or DISTINCT clause */
 	bool		acl_ok;			/* result of ACL check on table or column */
 } VariableStatData;

Multitenancy optimization

Reply via email to