Re: Using multiple extended statistics for estimates

Tomas Vondra Wed, 13 Nov 2019 07:29:46 -0800

Hi,

here's an updated patch, with some minor tweaks based on the review and
added tests (I ended up reworking those a bit, to make them more like
the existing ones).


There's also a new piece, dealing with functional dependencies. Until
now we did the same thing as for MCV lists - we picketd the "best"
extended statistics (with functional dependencies built) and just used
that. At first I thought we might simply do the same loop as for MCV
lists, but that does not really make sense because we might end up
applying "weaker" dependency first.

Say for example we have table with columns (a,b,c,d,e) and functional
dependencies on (a,b,c,d) and (c,d,e) where all the dependencies on
(a,b,c,d) are weaker than (c,d => e). In a query with clauses on all
attributes this is guaranteed to apply all dependencies from the first
statistic first, which si clearly wrong.

So what this does instead is simply merging all the dependencies from
all the relevant stats, and treating them as a single collection.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From 0e09c4749f3712da1983374a2838e4b7e14b7c62 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Tue, 12 Nov 2019 22:57:06 +0100
Subject: [PATCH 01/10] Apply multiple multivariate MCV lists when possible

Until now we've only used a single multivariate MCV list per relation,
covering the largest number of clauses. So for example given a query

    SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1

and extended statistics on (a,b) and (c,d), we'd only pick and use only
one of them. This commit relaxes this by repeating the process, using
the best statistics (matching the largest number of remaining clauses)
in each step.

This greedy algorithm is very simple, but may not be optimal. There may
be a different choice of stats leaving fewer clauses unestimated and/or
giving better estimates for some other reason.

This can however happen only when there are overlapping statistics, and
selecting one makes it impossible to use the other. E.g. with statistics
on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
But it's not clear which option is better, though, as each one ignores
information about possible correlation between different columns.

We however assume cases like this are rare, and the easiest solution is
to define statistics covering the whole group of correlated columns for
a given query. In the future we might support overlapping stats, using
some of the clauses as conditions (in conditional probability sense).

Author: Tomas Vondra
Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
---
 src/backend/statistics/extended_stats.c | 166 ++++++++++++++----------
 src/test/regress/expected/stats_ext.out |  58 +++++++++
 src/test/regress/sql/stats_ext.sql      |  36 +++++
 3 files changed, 195 insertions(+), 65 deletions(-)

diff --git a/src/backend/statistics/extended_stats.c 
b/src/backend/statistics/extended_stats.c
index 207ee3160e..b55799b8b1 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1123,6 +1123,33 @@ statext_is_compatible_clause(PlannerInfo *root, Node 
*clause, Index relid,
        return true;
 }
 
+/*
+ * statext_mcv_clause_attnums
+ *             Recalculate attnums from compatible but not-yet-estimated 
clauses.
+ */
+static Bitmapset *
+statext_mcv_remaining_attnums(int nclauses, Bitmapset **estimatedclauses,
+                                                         Bitmapset 
**list_attnums)
+{
+       int                     listidx;
+       Bitmapset  *attnums = NULL;
+
+       for (listidx = 0; listidx < nclauses; listidx++)
+       {
+               /*
+                * Skip clauses that have no precalculated attnums, which means 
it is
+                * either incompatible or was already used by some other 
statistic.
+                */
+               if (!list_attnums[listidx])
+                       continue;
+
+               if (!bms_is_member(listidx, *estimatedclauses))
+                       attnums = bms_add_members(attnums, 
list_attnums[listidx]);
+       }
+
+       return attnums;
+}
+
 /*
  * statext_mcv_clauselist_selectivity
  *             Estimate clauses using the best multi-column statistics.
@@ -1173,11 +1200,6 @@ statext_is_compatible_clause(PlannerInfo *root, Node 
*clause, Index relid,
  * 'estimatedclauses' is an input/output parameter.  We set bits for the
  * 0-based 'clauses' indexes we estimate for and also skip clause items that
  * already have a bit set.
- *
- * XXX If we were to use multiple statistics, this is where it would happen.
- * We would simply repeat this on a loop on the "remaining" clauses, possibly
- * using the already estimated clauses as conditions (and combining the values
- * using conditional probability formula).
  */
 static Selectivity
 statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int 
varRelid,
@@ -1188,14 +1210,7 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, 
List *clauses, int varReli
        Bitmapset  *clauses_attnums = NULL;
        Bitmapset **list_attnums;
        int                     listidx;
-       StatisticExtInfo *stat;
-       List       *stat_clauses;
-       Selectivity simple_sel,
-                               mcv_sel,
-                               mcv_basesel,
-                               mcv_totalsel,
-                               other_sel,
-                               sel;
+       Selectivity     sel = 1.0;
 
        /* check if there's any stats that might be useful for us. */
        if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
@@ -1223,78 +1238,99 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, 
List *clauses, int varReli
 
                if (!bms_is_member(listidx, *estimatedclauses) &&
                        statext_is_compatible_clause(root, clause, rel->relid, 
&attnums))
-               {
                        list_attnums[listidx] = attnums;
-                       clauses_attnums = bms_add_members(clauses_attnums, 
attnums);
-               }
                else
                        list_attnums[listidx] = NULL;
 
                listidx++;
        }
 
-       /* We need at least two attributes for multivariate statistics. */
-       if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
-               return 1.0;
+       /* apply as many extended statistics as possible */
+       while (true)
+       {
+               StatisticExtInfo *stat;
+               List       *stat_clauses;
+               Selectivity simple_sel,
+                                       mcv_sel,
+                                       mcv_basesel,
+                                       mcv_totalsel,
+                                       other_sel,
+                                       stat_sel;
 
-       /* find the best suited statistics object for these attnums */
-       stat = choose_best_statistics(rel->statlist, clauses_attnums, 
STATS_EXT_MCV);
+               /*
+                * Recompute attnums in the remaining clauses (we simply use 
the bitmaps
+                * computed earlier, so that we don't have to inspect the 
clauses again).
+                */
+               clauses_attnums = 
statext_mcv_remaining_attnums(list_length(clauses),
+                                                                               
                                estimatedclauses,
+                                                                               
                                list_attnums);
 
-       /* if no matching stats could be found then we've nothing to do */
-       if (!stat)
-               return 1.0;
+               /* We need at least two attributes for multivariate statistics. 
*/
+               if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
+                       break;
 
-       /* Ensure choose_best_statistics produced an expected stats type. */
-       Assert(stat->kind == STATS_EXT_MCV);
+               /* find the best suited statistics object for these attnums */
+               stat = choose_best_statistics(rel->statlist, clauses_attnums, 
STATS_EXT_MCV);
 
-       /* now filter the clauses to be estimated using the selected MCV */
-       stat_clauses = NIL;
+               /* if no (additional) matching stats could be found then we've 
nothing to do */
+               if (!stat)
+                       break;
 
-       listidx = 0;
-       foreach(l, clauses)
-       {
-               /*
-                * If the clause is compatible with the selected statistics, 
mark it
-                * as estimated and add it to the list to estimate.
-                */
-               if (list_attnums[listidx] != NULL &&
-                       bms_is_subset(list_attnums[listidx], stat->keys))
+               /* Ensure choose_best_statistics produced an expected stats 
type. */
+               Assert(stat->kind == STATS_EXT_MCV);
+
+               /* now filter the clauses to be estimated using the selected 
MCV */
+               stat_clauses = NIL;
+
+               listidx = 0;
+               foreach(l, clauses)
                {
-                       stat_clauses = lappend(stat_clauses, (Node *) 
lfirst(l));
-                       *estimatedclauses = bms_add_member(*estimatedclauses, 
listidx);
+                       /*
+                        * If the clause is compatible with the selected 
statistics, mark it
+                        * as estimated and add it to the list to estimate.
+                        */
+                       if (list_attnums[listidx] != NULL &&
+                               bms_is_subset(list_attnums[listidx], 
stat->keys))
+                       {
+                               stat_clauses = lappend(stat_clauses, (Node *) 
lfirst(l));
+                               *estimatedclauses = 
bms_add_member(*estimatedclauses, listidx);
+                       }
+
+                       listidx++;
                }
 
-               listidx++;
-       }
+               /*
+                * First compute "simple" selectivity, i.e. without the extended
+                * statistics, and essentially assuming independence of the
+                * columns/clauses. We'll then use the various selectivities 
computed from
+                * MCV list to improve it.
+                */
+               simple_sel = clauselist_selectivity_simple(root, stat_clauses, 
varRelid,
+                                                                               
                jointype, sjinfo, NULL);
 
-       /*
-        * First compute "simple" selectivity, i.e. without the extended
-        * statistics, and essentially assuming independence of the
-        * columns/clauses. We'll then use the various selectivities computed 
from
-        * MCV list to improve it.
-        */
-       simple_sel = clauselist_selectivity_simple(root, stat_clauses, varRelid,
-                                                                               
           jointype, sjinfo, NULL);
+               /*
+                * Now compute the multi-column estimate from the MCV list, 
along with the
+                * other selectivities (base & total selectivity).
+                */
+               mcv_sel = mcv_clauselist_selectivity(root, stat, stat_clauses, 
varRelid,
+                                                                               
         jointype, sjinfo, rel,
+                                                                               
         &mcv_basesel, &mcv_totalsel);
 
-       /*
-        * Now compute the multi-column estimate from the MCV list, along with 
the
-        * other selectivities (base & total selectivity).
-        */
-       mcv_sel = mcv_clauselist_selectivity(root, stat, stat_clauses, varRelid,
-                                                                               
 jointype, sjinfo, rel,
-                                                                               
 &mcv_basesel, &mcv_totalsel);
+               /* Estimated selectivity of values not covered by MCV matches */
+               other_sel = simple_sel - mcv_basesel;
+               CLAMP_PROBABILITY(other_sel);
 
-       /* Estimated selectivity of values not covered by MCV matches */
-       other_sel = simple_sel - mcv_basesel;
-       CLAMP_PROBABILITY(other_sel);
+               /* The non-MCV selectivity can't exceed the 1 - mcv_totalsel. */
+               if (other_sel > 1.0 - mcv_totalsel)
+                       other_sel = 1.0 - mcv_totalsel;
 
-       /* The non-MCV selectivity can't exceed the 1 - mcv_totalsel. */
-       if (other_sel > 1.0 - mcv_totalsel)
-               other_sel = 1.0 - mcv_totalsel;
+               /* Overall selectivity is the combination of MCV and non-MCV 
estimates. */
+               stat_sel = mcv_sel + other_sel;
+               CLAMP_PROBABILITY(stat_sel);
 
-       /* Overall selectivity is the combination of MCV and non-MCV estimates. 
*/
-       sel = mcv_sel + other_sel;
-       CLAMP_PROBABILITY(sel);
+               /* Factor the estimate from this MCV to the oveall estimate. */
+               sel *= stat_sel;
+       }
 
        return sel;
 }
diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index b65228fa07..c5ae282122 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -747,6 +747,64 @@ SELECT * FROM check_estimated_rows('SELECT * FROM 
mcv_lists_bool WHERE NOT a AND
          1 |      0
 (1 row)
 
+-- check the ability to use multiple MCV lists
+CREATE TABLE mcv_lists_multi (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+-- 
+INSERT INTO mcv_lists_multi (a, b, c, d)
+    SELECT
+         mod(i,5),
+         mod(i,5),
+         mod(i,7),
+         mod(i,7)
+    FROM generate_series(1,5000) s(i);
+ANALYZE mcv_lists_multi;
+-- estimates without any mcv statistics
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0');
+ estimated | actual 
+-----------+--------
+       200 |   1000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 
AND d = 0');
+ estimated | actual 
+-----------+--------
+       102 |    714
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+         4 |    142
+(1 row)
+
+-- create separate MCV statistics
+CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
+CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
+ANALYZE mcv_lists_multi;
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0');
+ estimated | actual 
+-----------+--------
+      1000 |   1000
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 
AND d = 0');
+ estimated | actual 
+-----------+--------
+       714 |    714
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+       143 |    142
+(1 row)
+
+DROP TABLE mcv_lists_multi;
 -- Permission tests. Users should not be able to see specific data values in
 -- the extended statistics, if they lack permission to see those values in
 -- the underlying table.
diff --git a/src/test/regress/sql/stats_ext.sql 
b/src/test/regress/sql/stats_ext.sql
index 040ee97a1e..ca3d20ee42 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -488,6 +488,42 @@ SELECT * FROM check_estimated_rows('SELECT * FROM 
mcv_lists_bool WHERE NOT a AND
 
 SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_bool WHERE NOT a 
AND b AND NOT c');
 
+-- check the ability to use multiple MCV lists
+CREATE TABLE mcv_lists_multi (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+
+-- 
+INSERT INTO mcv_lists_multi (a, b, c, d)
+    SELECT
+         mod(i,5),
+         mod(i,5),
+         mod(i,7),
+         mod(i,7)
+    FROM generate_series(1,5000) s(i);
+
+ANALYZE mcv_lists_multi;
+
+-- estimates without any mcv statistics
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 
AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0 AND c = 0 AND d = 0');
+
+-- create separate MCV statistics
+CREATE STATISTICS mcv_lists_multi_1 (mcv) ON a, b FROM mcv_lists_multi;
+CREATE STATISTICS mcv_lists_multi_2 (mcv) ON c, d FROM mcv_lists_multi;
+
+ANALYZE mcv_lists_multi;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE c = 0 
AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists_multi WHERE a = 0 
AND b = 0 AND c = 0 AND d = 0');
+
+DROP TABLE mcv_lists_multi;
+
 -- Permission tests. Users should not be able to see specific data values in
 -- the extended statistics, if they lack permission to see those values in
 -- the underlying table.
-- 
2.21.0

>From 9ad9402c292c97634f9cb1962bef8aa74c0f19f4 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Tue, 12 Nov 2019 18:01:07 +0100
Subject: [PATCH 02/10] Apply all available functional dependencies

When considering functional dependencies during selectivity estimation,
it's not necessary to bother with selection the best extended statistic.
We can simply consider all applicable functional dependencies at once.

Author: Tomas Vondra
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
---
 src/backend/statistics/dependencies.c   | 102 +++++++++++++++---------
 src/test/regress/expected/stats_ext.out |  58 ++++++++++++++
 src/test/regress/sql/stats_ext.sql      |  36 +++++++++
 3 files changed, 157 insertions(+), 39 deletions(-)

diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index 9493b2d539..a6ca11c675 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -77,8 +77,8 @@ static bool dependency_implies_attribute(MVDependency 
*dependency,
                                                                                
 AttrNumber attnum);
 static bool dependency_is_compatible_clause(Node *clause, Index relid,
                                                                                
        AttrNumber *attnum);
-static MVDependency *find_strongest_dependency(StatisticExtInfo *stats,
-                                                                               
           MVDependencies *dependencies,
+static MVDependency *find_strongest_dependency(MVDependencies **dependencies,
+                                                                               
           int ndependencies,
                                                                                
           Bitmapset *attnums);
 
 static void
@@ -862,10 +862,10 @@ dependency_is_compatible_clause(Node *clause, Index 
relid, AttrNumber *attnum)
  * (see the comment in dependencies_clauselist_selectivity).
  */
 static MVDependency *
-find_strongest_dependency(StatisticExtInfo *stats, MVDependencies 
*dependencies,
+find_strongest_dependency(MVDependencies **dependencies, int ndependencies,
                                                  Bitmapset *attnums)
 {
-       int                     i;
+       int                     i, j;
        MVDependency *strongest = NULL;
 
        /* number of attnums in clauses */
@@ -876,36 +876,39 @@ find_strongest_dependency(StatisticExtInfo *stats, 
MVDependencies *dependencies,
         * fully-matched dependencies. We do the cheap checks first, before
         * matching it against the attnums.
         */
-       for (i = 0; i < dependencies->ndeps; i++)
+       for (i = 0; i < ndependencies; i++)
        {
-               MVDependency *dependency = dependencies->deps[i];
-
-               /*
-                * Skip dependencies referencing more attributes than available
-                * clauses, as those can't be fully matched.
-                */
-               if (dependency->nattributes > nattnums)
-                       continue;
-
-               if (strongest)
+               for (j = 0; j < dependencies[i]->ndeps; j++)
                {
-                       /* skip dependencies on fewer attributes than the 
strongest. */
-                       if (dependency->nattributes < strongest->nattributes)
-                               continue;
+                       MVDependency *dependency = dependencies[i]->deps[j];
 
-                       /* also skip weaker dependencies when attribute count 
matches */
-                       if (strongest->nattributes == dependency->nattributes &&
-                               strongest->degree > dependency->degree)
+                       /*
+                        * Skip dependencies referencing more attributes than 
available
+                        * clauses, as those can't be fully matched.
+                        */
+                       if (dependency->nattributes > nattnums)
                                continue;
-               }
 
-               /*
-                * this dependency is stronger, but we must still check that 
it's
-                * fully matched to these attnums. We perform this check last 
as it's
-                * slightly more expensive than the previous checks.
-                */
-               if (dependency_is_fully_matched(dependency, attnums))
-                       strongest = dependency; /* save new best match */
+                       if (strongest)
+                       {
+                               /* skip dependencies on fewer attributes than 
the strongest. */
+                               if (dependency->nattributes < 
strongest->nattributes)
+                                       continue;
+
+                               /* also skip weaker dependencies when attribute 
count matches */
+                               if (strongest->nattributes == 
dependency->nattributes &&
+                                       strongest->degree > dependency->degree)
+                                       continue;
+                       }
+
+                       /*
+                        * this dependency is stronger, but we must still check 
that it's
+                        * fully matched to these attnums. We perform this 
check last as it's
+                        * slightly more expensive than the previous checks.
+                        */
+                       if (dependency_is_fully_matched(dependency, attnums))
+                               strongest = dependency; /* save new best match 
*/
+               }
        }
 
        return strongest;
@@ -949,10 +952,10 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
        Selectivity s1 = 1.0;
        ListCell   *l;
        Bitmapset  *clauses_attnums = NULL;
-       StatisticExtInfo *stat;
-       MVDependencies *dependencies;
        AttrNumber *list_attnums;
        int                     listidx;
+       MVDependencies    **dependencies = NULL;
+       int                                     ndependencies = 0;
 
        /* check if there's any stats that might be useful for us. */
        if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
@@ -1001,20 +1004,41 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
                return 1.0;
        }
 
-       /* find the best suited statistics object for these attnums */
-       stat = choose_best_statistics(rel->statlist, clauses_attnums,
-                                                                 
STATS_EXT_DEPENDENCIES);
+       /*
+        * Load all functional dependencies matching at least two parameters.
+        */
+       ndependencies = 0;
+       dependencies = (MVDependencies **) palloc(sizeof(MVDependencies *) *
+                                                                               
          list_length(rel->statlist));
+
+       foreach(l,rel->statlist)
+       {
+               StatisticExtInfo   *stat = (StatisticExtInfo *) lfirst(l);
+               Bitmapset                  *matched;
+               int                                     num_matched;
+
+               /* skip statistics that are not of the correct type */
+               if (stat->kind != STATS_EXT_DEPENDENCIES)
+                       continue;
+
+               matched = bms_intersect(clauses_attnums, stat->keys);
+               num_matched = bms_num_members(matched);
+               bms_free(matched);
+
+               if (num_matched < 2)
+                       continue;
+
+               dependencies[ndependencies++]
+                       = statext_dependencies_load(stat->statOid);
+       }
 
        /* if no matching stats could be found then we've nothing to do */
-       if (!stat)
+       if (!ndependencies)
        {
                pfree(list_attnums);
                return 1.0;
        }
 
-       /* load the dependency items stored in the statistics object */
-       dependencies = statext_dependencies_load(stat->statOid);
-
        /*
         * Apply the dependencies recursively, starting with the 
widest/strongest
         * ones, and proceeding to the smaller/weaker ones. At the end of each
@@ -1027,7 +1051,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
                MVDependency *dependency;
 
                /* the widest/strongest dependency, fully matched by clauses */
-               dependency = find_strongest_dependency(stat, dependencies,
+               dependency = find_strongest_dependency(dependencies, 
ndependencies,
                                                                                
           clauses_attnums);
 
                /* if no suitable dependency was found, we're done */
diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index c5ae282122..a35d8f2c2e 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -444,6 +444,64 @@ SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies WHERE
         50 |     50
 (1 row)
 
+-- check the ability to use multiple functional dependencies
+CREATE TABLE functional_dependencies_multi (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+-- 
+INSERT INTO functional_dependencies_multi (a, b, c, d)
+    SELECT
+         mod(i,7),
+         mod(i,7),
+         mod(i,11),
+         mod(i,11)
+    FROM generate_series(1,5000) s(i);
+ANALYZE functional_dependencies_multi;
+-- estimates without any functional dependencies
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0');
+ estimated | actual 
+-----------+--------
+       102 |    714
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+        41 |    454
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+         1 |     64
+(1 row)
+
+-- create separate functional dependencies
+CREATE STATISTICS functional_dependencies_multi_1 (dependencies) ON a, b FROM 
functional_dependencies_multi;
+CREATE STATISTICS functional_dependencies_multi_2 (dependencies) ON c, d FROM 
functional_dependencies_multi;
+ANALYZE functional_dependencies_multi;
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0');
+ estimated | actual 
+-----------+--------
+       714 |    714
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+       454 |    454
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+        65 |     64
+(1 row)
+
+DROP TABLE functional_dependencies_multi;
 -- MCV lists
 CREATE TABLE mcv_lists (
     filler1 TEXT,
diff --git a/src/test/regress/sql/stats_ext.sql 
b/src/test/regress/sql/stats_ext.sql
index ca3d20ee42..69c3f726e6 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -288,6 +288,42 @@ ANALYZE functional_dependencies;
 
 SELECT * FROM check_estimated_rows('SELECT * FROM functional_dependencies 
WHERE a = 1 AND b = ''1'' AND c = 1');
 
+-- check the ability to use multiple functional dependencies
+CREATE TABLE functional_dependencies_multi (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+
+-- 
+INSERT INTO functional_dependencies_multi (a, b, c, d)
+    SELECT
+         mod(i,7),
+         mod(i,7),
+         mod(i,11),
+         mod(i,11)
+    FROM generate_series(1,5000) s(i);
+
+ANALYZE functional_dependencies_multi;
+
+-- estimates without any functional dependencies
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE c = 0 AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+
+-- create separate functional dependencies
+CREATE STATISTICS functional_dependencies_multi_1 (dependencies) ON a, b FROM 
functional_dependencies_multi;
+CREATE STATISTICS functional_dependencies_multi_2 (dependencies) ON c, d FROM 
functional_dependencies_multi;
+
+ANALYZE functional_dependencies_multi;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE c = 0 AND d = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies_multi WHERE a = 0 AND b = 0 AND c = 0 AND d = 0');
+
+DROP TABLE functional_dependencies_multi;
+
 -- MCV lists
 CREATE TABLE mcv_lists (
     filler1 TEXT,
-- 
2.21.0

Re: Using multiple extended statistics for estimates

Reply via email to