Postgres picks suboptimal index after building of an extended statistics

Andrey V. Lepikhov Wed, 23 Jun 2021 07:19:43 -0700

Hi,

Ivan Frolkov reported a problem with choosing a non-optimal index during a query optimization. This problem appeared after building of an extended statistics.


I prepared the test case (see t.sql in attachment).

For reproduction of this case we need to have a composite primary key index and one another index. Before creation of extended statistics, SELECT from the table choose PK index and returns only one row. But after, this SELECT picks alternative index, fetches and filters many tuples.


The problem is related to a corner case in btree cost estimation procedure:
if postgres detects unique one-row index scan, it sets
numIndexTuples to 1.0.

But the selectivity is calculated as usual, by the clauselist_selectivity() routine and can have a value, much more than corresponding to single tuple. This selectivity value is used later in the code to calculate a number of fetched tuples and can lead to choosing of an suboptimal index.


The attached patch is my suggestion to fix this problem.

--
regards,
Andrey Lepikhov
Postgres Professional

t.sql
Description: application/sql

From 810eb4691acec26a07d8fa67bedb7a7381c31824 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepik...@postgrespro.ru>
Date: Wed, 23 Jun 2021 12:05:24 +0500
Subject: [PATCH] In the case of an unique one row btree index scan only one
 row can be returned. In the genericcostestimate() routine we must arrange the
 index selectivity value in accordance with this knowledge.

---
 src/backend/utils/adt/selfuncs.c | 106 ++++++++++++++++++-------------
 1 file changed, 62 insertions(+), 44 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0c8c05f6c2..91160960aa 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6369,65 +6369,82 @@ genericcostestimate(PlannerInfo *root,
        double          num_scans;
        double          qual_op_cost;
        double          qual_arg_cost;
-       List       *selectivityQuals;
-       ListCell   *l;
-
-       /*
-        * If the index is partial, AND the index predicate with the explicitly
-        * given indexquals to produce a more accurate idea of the index
-        * selectivity.
-        */
-       selectivityQuals = add_predicate_to_index_quals(index, indexQuals);
 
-       /*
-        * Check for ScalarArrayOpExpr index quals, and estimate the number of
-        * index scans that will be performed.
-        */
+       numIndexTuples = costs->numIndexTuples;
        num_sa_scans = 1;
-       foreach(l, indexQuals)
+
+       if (numIndexTuples >= 0.0)
        {
-               RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+               List            *selectivityQuals;
+               ListCell        *l;
 
-               if (IsA(rinfo->clause, ScalarArrayOpExpr))
+               /*
+                * If the index is partial, AND the index predicate with the 
explicitly
+                * given indexquals to produce a more accurate idea of the index
+                * selectivity.
+                */
+               selectivityQuals = add_predicate_to_index_quals(index, 
indexQuals);
+
+               /*
+                * Check for ScalarArrayOpExpr index quals, and estimate the 
number of
+                * index scans that will be performed.
+                */
+               foreach(l, indexQuals)
                {
-                       ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) 
rinfo->clause;
-                       int                     alength = 
estimate_array_length(lsecond(saop->args));
+                       RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
 
-                       if (alength > 1)
-                               num_sa_scans *= alength;
+                       if (IsA(rinfo->clause, ScalarArrayOpExpr))
+                       {
+                               ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) 
rinfo->clause;
+                               int                     alength = 
estimate_array_length(lsecond(saop->args));
+
+                               if (alength > 1)
+                                       num_sa_scans *= alength;
+                       }
                }
-       }
 
-       /* Estimate the fraction of main-table tuples that will be visited */
-       indexSelectivity = clauselist_selectivity(root, selectivityQuals,
-                                                                               
          index->rel->relid,
-                                                                               
          JOIN_INNER,
-                                                                               
          NULL);
+               /* Estimate the fraction of main-table tuples that will be 
visited */
+               indexSelectivity = clauselist_selectivity(root, 
selectivityQuals,
+                                                                               
                  index->rel->relid,
+                                                                               
                  JOIN_INNER,
+                                                                               
                  NULL);
 
-       /*
-        * If caller didn't give us an estimate, estimate the number of index
-        * tuples that will be visited.  We do it in this rather 
peculiar-looking
-        * way in order to get the right answer for partial indexes.
-        */
-       numIndexTuples = costs->numIndexTuples;
-       if (numIndexTuples <= 0.0)
+               /*
+                * If caller didn't give us an estimate, estimate the number of 
index
+                * tuples that will be visited.  We do it in this rather 
peculiar-looking
+                * way in order to get the right answer for partial indexes.
+                */
+               if (numIndexTuples == 0.0)
+               {
+                       numIndexTuples = indexSelectivity * index->rel->tuples;
+
+                       /*
+                        * The above calculation counts all the tuples visited 
across all
+                        * scans induced by ScalarArrayOpExpr nodes.  We want 
to consider the
+                        * average per-indexscan number, so adjust.  This is a 
handy place to
+                        * round to integer, too.  (If caller supplied tuple 
estimate, it's
+                        * responsible for handling these considerations.)
+                        */
+                       numIndexTuples = rint(numIndexTuples / num_sa_scans);
+               }
+       }
+       else
        {
-               numIndexTuples = indexSelectivity * index->rel->tuples;
+               Assert(numIndexTuples == -1.0);
 
                /*
-                * The above calculation counts all the tuples visited across 
all
-                * scans induced by ScalarArrayOpExpr nodes.  We want to 
consider the
-                * average per-indexscan number, so adjust.  This is a handy 
place to
-                * round to integer, too.  (If caller supplied tuple estimate, 
it's
-                * responsible for handling these considerations.)
+                * Unique one row scan can select no more than one row. It 
needs to
+                * manually set the selectivity of the index. The value of 
numIndexTuples
+                * will be corrected later.
                 */
-               numIndexTuples = rint(numIndexTuples / num_sa_scans);
+               indexSelectivity = 1.0 / index->rel->tuples;
        }
 
        /*
         * We can bound the number of tuples by the index size in any case. 
Also,
         * always estimate at least one tuple is touched, even when
         * indexSelectivity estimate is tiny.
+        * Also, one row single scan case need to fix the numIndexTuples value 
here.
         */
        if (numIndexTuples > index->tuples)
                numIndexTuples = index->tuples;
@@ -6710,16 +6727,17 @@ btcostestimate(PlannerInfo *root, IndexPath *path, 
double loop_count,
 
        /*
         * If index is unique and we found an '=' clause for each column, we can
-        * just assume numIndexTuples = 1 and skip the expensive
-        * clauselist_selectivity calculations.  However, a ScalarArrayOp or
-        * NullTest invalidates that theory, even though it sets eqQualHere.
+        * skip the expensive clauselist_selectivity calculations and assume
+        * numIndexTuples = -1.0 in order to tell the generic cost estimator to 
skip
+        * such expensive calculations too. However, a ScalarArrayOp or NullTest
+        * invalidates that theory, even though it sets eqQualHere.
         */
        if (index->unique &&
                indexcol == index->nkeycolumns - 1 &&
                eqQualHere &&
                !found_saop &&
                !found_is_null_op)
-               numIndexTuples = 1.0;
+               numIndexTuples = -1.0;
        else
        {
                List       *selectivityQuals;
-- 
2.31.1

Postgres picks suboptimal index after building of an extended statistics

Reply via email to