Re: missing indexes in indexlist with partitioned tables

David Rowley Thu, 15 Sep 2022 21:09:07 -0700

On Wed, 3 Aug 2022 at 11:07, Arne Roland <a.rol...@index.de> wrote:
> Attached a rebased version of the patch.

Firstly, I agree that we should fix the issue of join removals not
working with partitioned tables.

I had a quick look over this and the first thing that I thought was
the same as what Amit mentioned in:

On Tue, 25 Jan 2022 at 21:04, Amit Langote <amitlangot...@gmail.com> wrote:
> Finally, I don't understand why we need a separate field to store
> indexes found in partitioned base relations.  AFAICS, nothing but the
> sites you are interested in (relation_has_unique_index_for() and
> rel_supports_distinctness()) would ever bother to look at a
> partitioned base relation's indexlist.  Do you think putting them into
> in indexlist might break something?

I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is
the place to store these details.  That commit added the following
comment:

/*
* Ignore partitioned indexes, since they are not usable for
* queries.
*/

But neither are hypothetical indexes either, yet they're added to
RelOptInfo.indexlist.

I think the patch should be changed so that the existing list is used
and we find another fix for the problems Alvaro fixed in 05fb5d661.
Unfortunately, there was no discussion marked on that commit message,
so it's not quite clear what the problem was. I'm unsure if there was
anything other than CLUSTER that was broken.  I see that cfdd03f45
added CLUSTER for partitioned tables in v15. I think the patch would
need to go over the usages of RelOptInfo.indexlist to make sure that
we don't need to add any further conditions to skip their usage for
partitioned tables.

I wrote the attached patch as I wanted to see what would break if we
did this. The only problem I got from running make check was in
get_actual_variable_range(), so I just changed that so it returns
false when the given rel is a partitioned table. I only quickly did
the changes to get_relation_info() and didn't give much thought to
what the can* bool flags should be set to.  I just mostly skipped all
that code because it was crashing on
relation->rd_tableam->scan_bitmap_next_block due to the rd_tableam
being NULL.

Also, just a friendly tip, Arne; I saw you named your patch 0006 for
version 6.  You'll see many 000n patches around the list, but those
are generally done with git format-patch. That number normally means
the patch in the patch series, not the version of the patch. I'm not
sure if it'll help any, but my workflow for writing new patches
against master tends to be:

git checkout master
git checkout -b some_feature_branch
# write some code
git commit -a
# maybe more code
git commit -a
git format-patch -v1 master

That'll create v1-0001 and v1-0002 patches. When I'm onto v2, I just
change the version number from -v1 to -v2.

David

diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 5012bfe142..a2b7667fc7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -156,10 +156,10 @@ get_relation_info(PlannerInfo *root, Oid 
relationObjectId, bool inhparent,
 
        /*
         * Make list of indexes.  Ignore indexes on system catalogs if told to.
-        * Don't bother with indexes for an inheritance parent, either.
+        * Don't bother with indexes from traditional inheritance parents, 
either.
         */
-       if (inhparent ||
-               (IgnoreSystemIndexes && IsSystemRelation(relation)))
+       if ((inhparent && relation->rd_rel->relkind != 
RELKIND_PARTITIONED_TABLE)
+               || (IgnoreSystemIndexes && IsSystemRelation(relation)))
                hasindex = false;
        else
                hasindex = relation->rd_rel->relhasindex;
@@ -212,16 +212,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
                                continue;
                        }
 
-                       /*
-                        * Ignore partitioned indexes, since they are not 
usable for
-                        * queries.
-                        */
-                       if (indexRelation->rd_rel->relkind == 
RELKIND_PARTITIONED_INDEX)
-                       {
-                               index_close(indexRelation, NoLock);
-                               continue;
-                       }
-
                        /*
                         * If the index is valid, but cannot yet be used, 
ignore it; but
                         * mark the plan we are generating as transient. See
@@ -266,108 +256,127 @@ get_relation_info(PlannerInfo *root, Oid 
relationObjectId, bool inhparent,
 
                        info->relam = indexRelation->rd_rel->relam;
 
-                       /* We copy just the fields we need, not all of rd_indam 
*/
-                       amroutine = indexRelation->rd_indam;
-                       info->amcanorderbyop = amroutine->amcanorderbyop;
-                       info->amoptionalkey = amroutine->amoptionalkey;
-                       info->amsearcharray = amroutine->amsearcharray;
-                       info->amsearchnulls = amroutine->amsearchnulls;
-                       info->amcanparallel = amroutine->amcanparallel;
-                       info->amhasgettuple = (amroutine->amgettuple != NULL);
-                       info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
-                               relation->rd_tableam->scan_bitmap_next_block != 
NULL;
-                       info->amcanmarkpos = (amroutine->ammarkpos != NULL &&
-                                                                 
amroutine->amrestrpos != NULL);
-                       info->amcostestimate = amroutine->amcostestimate;
-                       Assert(info->amcostestimate != NULL);
-
-                       /* Fetch index opclass options */
-                       info->opclassoptions = 
RelationGetIndexAttOptions(indexRelation, true);
-
-                       /*
-                        * Fetch the ordering information for the index, if any.
-                        */
-                       if (info->relam == BTREE_AM_OID)
+                       if (indexRelation->rd_rel->relkind != 
RELKIND_PARTITIONED_INDEX)
                        {
+                               /* We copy just the fields we need, not all of 
rd_indam */
+                               amroutine = indexRelation->rd_indam;
+                               info->amcanorderbyop = 
amroutine->amcanorderbyop;
+                               info->amoptionalkey = amroutine->amoptionalkey;
+                               info->amsearcharray = amroutine->amsearcharray;
+                               info->amsearchnulls = amroutine->amsearchnulls;
+                               info->amcanparallel = amroutine->amcanparallel;
+                               info->amhasgettuple = (amroutine->amgettuple != 
NULL);
+                               info->amhasgetbitmap = amroutine->amgetbitmap 
!= NULL &&
+                                       
relation->rd_tableam->scan_bitmap_next_block != NULL;
+                               info->amcanmarkpos = (amroutine->ammarkpos != 
NULL &&
+                                       amroutine->amrestrpos != NULL);
+                               info->amcostestimate = 
amroutine->amcostestimate;
+                               Assert(info->amcostestimate != NULL);
+
+                               /* Fetch index opclass options */
+                               info->opclassoptions = 
RelationGetIndexAttOptions(indexRelation, true);
+
                                /*
-                                * If it's a btree index, we can use its 
opfamily OIDs
-                                * directly as the sort ordering opfamily OIDs.
+                                * Fetch the ordering information for the 
index, if any.
                                 */
-                               Assert(amroutine->amcanorder);
-
-                               info->sortopfamily = info->opfamily;
-                               info->reverse_sort = (bool *) 
palloc(sizeof(bool) * nkeycolumns);
-                               info->nulls_first = (bool *) 
palloc(sizeof(bool) * nkeycolumns);
-
-                               for (i = 0; i < nkeycolumns; i++)
+                               if (info->relam == BTREE_AM_OID)
                                {
-                                       int16           opt = 
indexRelation->rd_indoption[i];
+                                       /*
+                                        * If it's a btree index, we can use 
its opfamily OIDs
+                                        * directly as the sort ordering 
opfamily OIDs.
+                                        */
+                                       Assert(amroutine->amcanorder);
 
-                                       info->reverse_sort[i] = (opt & 
INDOPTION_DESC) != 0;
-                                       info->nulls_first[i] = (opt & 
INDOPTION_NULLS_FIRST) != 0;
-                               }
-                       }
-                       else if (amroutine->amcanorder)
-                       {
-                               /*
-                                * Otherwise, identify the corresponding btree 
opfamilies by
-                                * trying to map this index's "<" operators 
into btree.  Since
-                                * "<" uniquely defines the behavior of a sort 
order, this is
-                                * a sufficient test.
-                                *
-                                * XXX This method is rather slow and also 
requires the
-                                * undesirable assumption that the other index 
AM numbers its
-                                * strategies the same as btree.  It'd be 
better to have a way
-                                * to explicitly declare the corresponding 
btree opfamily for
-                                * each opfamily of the other index type.  But 
given the lack
-                                * of current or foreseeable amcanorder index 
types, it's not
-                                * worth expending more effort on now.
-                                */
-                               info->sortopfamily = (Oid *) palloc(sizeof(Oid) 
* nkeycolumns);
-                               info->reverse_sort = (bool *) 
palloc(sizeof(bool) * nkeycolumns);
-                               info->nulls_first = (bool *) 
palloc(sizeof(bool) * nkeycolumns);
+                                       info->sortopfamily = info->opfamily;
+                                       info->reverse_sort = (bool 
*)palloc(sizeof(bool) * nkeycolumns);
+                                       info->nulls_first = (bool 
*)palloc(sizeof(bool) * nkeycolumns);
 
-                               for (i = 0; i < nkeycolumns; i++)
-                               {
-                                       int16           opt = 
indexRelation->rd_indoption[i];
-                                       Oid                     ltopr;
-                                       Oid                     btopfamily;
-                                       Oid                     btopcintype;
-                                       int16           btstrategy;
-
-                                       info->reverse_sort[i] = (opt & 
INDOPTION_DESC) != 0;
-                                       info->nulls_first[i] = (opt & 
INDOPTION_NULLS_FIRST) != 0;
-
-                                       ltopr = 
get_opfamily_member(info->opfamily[i],
-                                                                               
                info->opcintype[i],
-                                                                               
                info->opcintype[i],
-                                                                               
                BTLessStrategyNumber);
-                                       if (OidIsValid(ltopr) &&
-                                               
get_ordering_op_properties(ltopr,
-                                                                               
                   &btopfamily,
-                                                                               
                   &btopcintype,
-                                                                               
                   &btstrategy) &&
-                                               btopcintype == 
info->opcintype[i] &&
-                                               btstrategy == 
BTLessStrategyNumber)
+                                       for (i = 0; i < nkeycolumns; i++)
                                        {
-                                               /* Successful mapping */
-                                               info->sortopfamily[i] = 
btopfamily;
+                                               int16           opt = 
indexRelation->rd_indoption[i];
+
+                                               info->reverse_sort[i] = (opt & 
INDOPTION_DESC) != 0;
+                                               info->nulls_first[i] = (opt & 
INDOPTION_NULLS_FIRST) != 0;
                                        }
-                                       else
+                               }
+                               else if (amroutine->amcanorder)
+                               {
+                                       /*
+                                        * Otherwise, identify the 
corresponding btree opfamilies by
+                                        * trying to map this index's "<" 
operators into btree.  Since
+                                        * "<" uniquely defines the behavior of 
a sort order, this is
+                                        * a sufficient test.
+                                        *
+                                        * XXX This method is rather slow and 
also requires the
+                                        * undesirable assumption that the 
other index AM numbers its
+                                        * strategies the same as btree.  It'd 
be better to have a way
+                                        * to explicitly declare the 
corresponding btree opfamily for
+                                        * each opfamily of the other index 
type.  But given the lack
+                                        * of current or foreseeable amcanorder 
index types, it's not
+                                        * worth expending more effort on now.
+                                        */
+                                       info->sortopfamily = (Oid 
*)palloc(sizeof(Oid) * nkeycolumns);
+                                       info->reverse_sort = (bool 
*)palloc(sizeof(bool) * nkeycolumns);
+                                       info->nulls_first = (bool 
*)palloc(sizeof(bool) * nkeycolumns);
+
+                                       for (i = 0; i < nkeycolumns; i++)
                                        {
-                                               /* Fail ... quietly treat index 
as unordered */
-                                               info->sortopfamily = NULL;
-                                               info->reverse_sort = NULL;
-                                               info->nulls_first = NULL;
-                                               break;
+                                               int16           opt = 
indexRelation->rd_indoption[i];
+                                               Oid                     ltopr;
+                                               Oid                     
btopfamily;
+                                               Oid                     
btopcintype;
+                                               int16           btstrategy;
+
+                                               info->reverse_sort[i] = (opt & 
INDOPTION_DESC) != 0;
+                                               info->nulls_first[i] = (opt & 
INDOPTION_NULLS_FIRST) != 0;
+
+                                               ltopr = 
get_opfamily_member(info->opfamily[i],
+                                                       info->opcintype[i],
+                                                       info->opcintype[i],
+                                                       BTLessStrategyNumber);
+                                               if (OidIsValid(ltopr) &&
+                                                       
get_ordering_op_properties(ltopr,
+                                                               &btopfamily,
+                                                               &btopcintype,
+                                                               &btstrategy) &&
+                                                       btopcintype == 
info->opcintype[i] &&
+                                                       btstrategy == 
BTLessStrategyNumber)
+                                               {
+                                                       /* Successful mapping */
+                                                       info->sortopfamily[i] = 
btopfamily;
+                                               }
+                                               else
+                                               {
+                                                       /* Fail ... quietly 
treat index as unordered */
+                                                       info->sortopfamily = 
NULL;
+                                                       info->reverse_sort = 
NULL;
+                                                       info->nulls_first = 
NULL;
+                                                       break;
+                                               }
                                        }
                                }
+                               else
+                               {
+                                       info->sortopfamily = NULL;
+                                       info->reverse_sort = NULL;
+                                       info->nulls_first = NULL;
+                               }
                        }
                        else
                        {
                                info->sortopfamily = NULL;
                                info->reverse_sort = NULL;
                                info->nulls_first = NULL;
+
+                               info->amcanorderbyop = false;
+                               info->amoptionalkey = false;
+                               info->amsearcharray = false;
+                               info->amsearchnulls = false;
+                               info->amcanparallel = false;
+                               info->amhasgettuple = false;
+                               info->amhasgetbitmap = false;
+                               info->amcanmarkpos = false;
+                               info->amcostestimate = NULL;
                        }
 
                        /*
@@ -414,7 +423,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
                                        info->tuples = rel->tuples;
                        }
 
-                       if (info->relam == BTREE_AM_OID)
+                       if (info->relam == BTREE_AM_OID &&
+                               indexRelation->rd_rel->relkind != 
RELKIND_PARTITIONED_INDEX)
                        {
                                /* For btrees, get tree height while we have 
the index open */
                                info->tree_height = 
_bt_getrootheight(indexRelation);
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index e898ffad7b..100b76c0da 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2971,9 +2971,11 @@ RelationGetNumberOfBlocksInFork(Relation relation, 
ForkNumber forkNum)
                return smgrnblocks(RelationGetSmgr(relation), forkNum);
        }
        else
-               Assert(false);
+       {
+               Assert(relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
+       }
 
-       return 0;                                       /* keep compiler quiet 
*/
+       return 0;
 }
 
 /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c746759eef..7d721755a2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5995,6 +5995,10 @@ get_actual_variable_range(PlannerInfo *root, 
VariableStatData *vardata,
        rte = root->simple_rte_array[rel->relid];
        Assert(rte->rtekind == RTE_RELATION);
 
+       /* ignore partitioned tables.  Any indexes here are not real indexes */
+       if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+               return false;
+
        /* Search through the indexes to see if any match our problem */
        foreach(lc, rel->indexlist)
        {
diff --git a/src/test/regress/expected/partition_join.out 
b/src/test/regress/expected/partition_join.out
index 03926a8413..c854228482 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -4852,46 +4852,46 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON 
(t1.a = t2.a AND t1.b = t2
 (8 rows)
 
 -- partitionwise join with fractional paths
-CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t (id BIGINT, c INT, PRIMARY KEY (id)) PARTITION BY RANGE 
(id);
 CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
 CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO 
('2000');
 -- insert data
-INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+INSERT INTO fract_t (id,c) SELECT i,i FROM generate_series(0, 1999) i;
 ANALYZE fract_t;
--- verify plan; nested index only scans
+-- verify plan; nested index scans
 SET max_parallel_workers_per_gather = 0;
 SET enable_partitionwise_join = on;
 EXPLAIN (COSTS OFF)
 SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 
10;
-                              QUERY PLAN                               
------------------------------------------------------------------------
+                            QUERY PLAN                            
+------------------------------------------------------------------
  Limit
    ->  Merge Append
          Sort Key: x.id
          ->  Merge Left Join
                Merge Cond: (x_1.id = y_1.id)
-               ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
-               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+               ->  Index Scan using fract_t0_pkey on fract_t0 x_1
+               ->  Index Scan using fract_t0_pkey on fract_t0 y_1
          ->  Merge Left Join
                Merge Cond: (x_2.id = y_2.id)
-               ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
-               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+               ->  Index Scan using fract_t1_pkey on fract_t1 x_2
+               ->  Index Scan using fract_t1_pkey on fract_t1 y_2
 (11 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 
10;
-                                   QUERY PLAN                                  
 
---------------------------------------------------------------------------------
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
  Limit
    ->  Merge Append
          Sort Key: x.id DESC
          ->  Nested Loop Left Join
-               ->  Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1
-               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
+               ->  Index Scan Backward using fract_t0_pkey on fract_t0 x_1
+               ->  Index Scan using fract_t0_pkey on fract_t0 y_1
                      Index Cond: (id = x_1.id)
          ->  Nested Loop Left Join
-               ->  Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2
-               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2
+               ->  Index Scan Backward using fract_t1_pkey on fract_t1 x_2
+               ->  Index Scan using fract_t1_pkey on fract_t1 y_2
                      Index Cond: (id = x_2.id)
 (11 rows)
 
diff --git a/src/test/regress/sql/partition_join.sql 
b/src/test/regress/sql/partition_join.sql
index 67f506361f..009184d348 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -1144,15 +1144,15 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON 
(t1.a = t2.a AND t1.b = t2
 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = 
t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND 
t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) 
AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b;
 
 -- partitionwise join with fractional paths
-CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
+CREATE TABLE fract_t (id BIGINT, c INT, PRIMARY KEY (id)) PARTITION BY RANGE 
(id);
 CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
 CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO 
('2000');
 
 -- insert data
-INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
+INSERT INTO fract_t (id,c) SELECT i,i FROM generate_series(0, 1999) i;
 ANALYZE fract_t;
 
--- verify plan; nested index only scans
+-- verify plan; nested index scans
 SET max_parallel_workers_per_gather = 0;
 SET enable_partitionwise_join = on;

Re: missing indexes in indexlist with partitioned tables

Reply via email to