On Wed, 3 Aug 2022 at 11:07, Arne Roland <a.rol...@index.de> wrote: > Attached a rebased version of the patch.
Firstly, I agree that we should fix the issue of join removals not working with partitioned tables. I had a quick look over this and the first thing that I thought was the same as what Amit mentioned in: On Tue, 25 Jan 2022 at 21:04, Amit Langote <amitlangot...@gmail.com> wrote: > Finally, I don't understand why we need a separate field to store > indexes found in partitioned base relations. AFAICS, nothing but the > sites you are interested in (relation_has_unique_index_for() and > rel_supports_distinctness()) would ever bother to look at a > partitioned base relation's indexlist. Do you think putting them into > in indexlist might break something? I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is the place to store these details. That commit added the following comment: /* * Ignore partitioned indexes, since they are not usable for * queries. */ But neither are hypothetical indexes either, yet they're added to RelOptInfo.indexlist. I think the patch should be changed so that the existing list is used and we find another fix for the problems Alvaro fixed in 05fb5d661. Unfortunately, there was no discussion marked on that commit message, so it's not quite clear what the problem was. I'm unsure if there was anything other than CLUSTER that was broken. I see that cfdd03f45 added CLUSTER for partitioned tables in v15. I think the patch would need to go over the usages of RelOptInfo.indexlist to make sure that we don't need to add any further conditions to skip their usage for partitioned tables. I wrote the attached patch as I wanted to see what would break if we did this. The only problem I got from running make check was in get_actual_variable_range(), so I just changed that so it returns false when the given rel is a partitioned table. I only quickly did the changes to get_relation_info() and didn't give much thought to what the can* bool flags should be set to. I just mostly skipped all that code because it was crashing on relation->rd_tableam->scan_bitmap_next_block due to the rd_tableam being NULL. Also, just a friendly tip, Arne; I saw you named your patch 0006 for version 6. You'll see many 000n patches around the list, but those are generally done with git format-patch. That number normally means the patch in the patch series, not the version of the patch. I'm not sure if it'll help any, but my workflow for writing new patches against master tends to be: git checkout master git checkout -b some_feature_branch # write some code git commit -a # maybe more code git commit -a git format-patch -v1 master That'll create v1-0001 and v1-0002 patches. When I'm onto v2, I just change the version number from -v1 to -v2. David
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 5012bfe142..a2b7667fc7 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -156,10 +156,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* * Make list of indexes. Ignore indexes on system catalogs if told to. - * Don't bother with indexes for an inheritance parent, either. + * Don't bother with indexes from traditional inheritance parents, either. */ - if (inhparent || - (IgnoreSystemIndexes && IsSystemRelation(relation))) + if ((inhparent && relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) + || (IgnoreSystemIndexes && IsSystemRelation(relation))) hasindex = false; else hasindex = relation->rd_rel->relhasindex; @@ -212,16 +212,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, continue; } - /* - * Ignore partitioned indexes, since they are not usable for - * queries. - */ - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) - { - index_close(indexRelation, NoLock); - continue; - } - /* * If the index is valid, but cannot yet be used, ignore it; but * mark the plan we are generating as transient. See @@ -266,108 +256,127 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->relam = indexRelation->rd_rel->relam; - /* We copy just the fields we need, not all of rd_indam */ - amroutine = indexRelation->rd_indam; - info->amcanorderbyop = amroutine->amcanorderbyop; - info->amoptionalkey = amroutine->amoptionalkey; - info->amsearcharray = amroutine->amsearcharray; - info->amsearchnulls = amroutine->amsearchnulls; - info->amcanparallel = amroutine->amcanparallel; - info->amhasgettuple = (amroutine->amgettuple != NULL); - info->amhasgetbitmap = amroutine->amgetbitmap != NULL && - relation->rd_tableam->scan_bitmap_next_block != NULL; - info->amcanmarkpos = (amroutine->ammarkpos != NULL && - amroutine->amrestrpos != NULL); - info->amcostestimate = amroutine->amcostestimate; - Assert(info->amcostestimate != NULL); - - /* Fetch index opclass options */ - info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true); - - /* - * Fetch the ordering information for the index, if any. - */ - if (info->relam == BTREE_AM_OID) + if (indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) { + /* We copy just the fields we need, not all of rd_indam */ + amroutine = indexRelation->rd_indam; + info->amcanorderbyop = amroutine->amcanorderbyop; + info->amoptionalkey = amroutine->amoptionalkey; + info->amsearcharray = amroutine->amsearcharray; + info->amsearchnulls = amroutine->amsearchnulls; + info->amcanparallel = amroutine->amcanparallel; + info->amhasgettuple = (amroutine->amgettuple != NULL); + info->amhasgetbitmap = amroutine->amgetbitmap != NULL && + relation->rd_tableam->scan_bitmap_next_block != NULL; + info->amcanmarkpos = (amroutine->ammarkpos != NULL && + amroutine->amrestrpos != NULL); + info->amcostestimate = amroutine->amcostestimate; + Assert(info->amcostestimate != NULL); + + /* Fetch index opclass options */ + info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true); + /* - * If it's a btree index, we can use its opfamily OIDs - * directly as the sort ordering opfamily OIDs. + * Fetch the ordering information for the index, if any. */ - Assert(amroutine->amcanorder); - - info->sortopfamily = info->opfamily; - info->reverse_sort = (bool *) palloc(sizeof(bool) * nkeycolumns); - info->nulls_first = (bool *) palloc(sizeof(bool) * nkeycolumns); - - for (i = 0; i < nkeycolumns; i++) + if (info->relam == BTREE_AM_OID) { - int16 opt = indexRelation->rd_indoption[i]; + /* + * If it's a btree index, we can use its opfamily OIDs + * directly as the sort ordering opfamily OIDs. + */ + Assert(amroutine->amcanorder); - info->reverse_sort[i] = (opt & INDOPTION_DESC) != 0; - info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0; - } - } - else if (amroutine->amcanorder) - { - /* - * Otherwise, identify the corresponding btree opfamilies by - * trying to map this index's "<" operators into btree. Since - * "<" uniquely defines the behavior of a sort order, this is - * a sufficient test. - * - * XXX This method is rather slow and also requires the - * undesirable assumption that the other index AM numbers its - * strategies the same as btree. It'd be better to have a way - * to explicitly declare the corresponding btree opfamily for - * each opfamily of the other index type. But given the lack - * of current or foreseeable amcanorder index types, it's not - * worth expending more effort on now. - */ - info->sortopfamily = (Oid *) palloc(sizeof(Oid) * nkeycolumns); - info->reverse_sort = (bool *) palloc(sizeof(bool) * nkeycolumns); - info->nulls_first = (bool *) palloc(sizeof(bool) * nkeycolumns); + info->sortopfamily = info->opfamily; + info->reverse_sort = (bool *)palloc(sizeof(bool) * nkeycolumns); + info->nulls_first = (bool *)palloc(sizeof(bool) * nkeycolumns); - for (i = 0; i < nkeycolumns; i++) - { - int16 opt = indexRelation->rd_indoption[i]; - Oid ltopr; - Oid btopfamily; - Oid btopcintype; - int16 btstrategy; - - info->reverse_sort[i] = (opt & INDOPTION_DESC) != 0; - info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0; - - ltopr = get_opfamily_member(info->opfamily[i], - info->opcintype[i], - info->opcintype[i], - BTLessStrategyNumber); - if (OidIsValid(ltopr) && - get_ordering_op_properties(ltopr, - &btopfamily, - &btopcintype, - &btstrategy) && - btopcintype == info->opcintype[i] && - btstrategy == BTLessStrategyNumber) + for (i = 0; i < nkeycolumns; i++) { - /* Successful mapping */ - info->sortopfamily[i] = btopfamily; + int16 opt = indexRelation->rd_indoption[i]; + + info->reverse_sort[i] = (opt & INDOPTION_DESC) != 0; + info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0; } - else + } + else if (amroutine->amcanorder) + { + /* + * Otherwise, identify the corresponding btree opfamilies by + * trying to map this index's "<" operators into btree. Since + * "<" uniquely defines the behavior of a sort order, this is + * a sufficient test. + * + * XXX This method is rather slow and also requires the + * undesirable assumption that the other index AM numbers its + * strategies the same as btree. It'd be better to have a way + * to explicitly declare the corresponding btree opfamily for + * each opfamily of the other index type. But given the lack + * of current or foreseeable amcanorder index types, it's not + * worth expending more effort on now. + */ + info->sortopfamily = (Oid *)palloc(sizeof(Oid) * nkeycolumns); + info->reverse_sort = (bool *)palloc(sizeof(bool) * nkeycolumns); + info->nulls_first = (bool *)palloc(sizeof(bool) * nkeycolumns); + + for (i = 0; i < nkeycolumns; i++) { - /* Fail ... quietly treat index as unordered */ - info->sortopfamily = NULL; - info->reverse_sort = NULL; - info->nulls_first = NULL; - break; + int16 opt = indexRelation->rd_indoption[i]; + Oid ltopr; + Oid btopfamily; + Oid btopcintype; + int16 btstrategy; + + info->reverse_sort[i] = (opt & INDOPTION_DESC) != 0; + info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0; + + ltopr = get_opfamily_member(info->opfamily[i], + info->opcintype[i], + info->opcintype[i], + BTLessStrategyNumber); + if (OidIsValid(ltopr) && + get_ordering_op_properties(ltopr, + &btopfamily, + &btopcintype, + &btstrategy) && + btopcintype == info->opcintype[i] && + btstrategy == BTLessStrategyNumber) + { + /* Successful mapping */ + info->sortopfamily[i] = btopfamily; + } + else + { + /* Fail ... quietly treat index as unordered */ + info->sortopfamily = NULL; + info->reverse_sort = NULL; + info->nulls_first = NULL; + break; + } } } + else + { + info->sortopfamily = NULL; + info->reverse_sort = NULL; + info->nulls_first = NULL; + } } else { info->sortopfamily = NULL; info->reverse_sort = NULL; info->nulls_first = NULL; + + info->amcanorderbyop = false; + info->amoptionalkey = false; + info->amsearcharray = false; + info->amsearchnulls = false; + info->amcanparallel = false; + info->amhasgettuple = false; + info->amhasgetbitmap = false; + info->amcanmarkpos = false; + info->amcostestimate = NULL; } /* @@ -414,7 +423,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->tuples = rel->tuples; } - if (info->relam == BTREE_AM_OID) + if (info->relam == BTREE_AM_OID && + indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) { /* For btrees, get tree height while we have the index open */ info->tree_height = _bt_getrootheight(indexRelation); diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index e898ffad7b..100b76c0da 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -2971,9 +2971,11 @@ RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum) return smgrnblocks(RelationGetSmgr(relation), forkNum); } else - Assert(false); + { + Assert(relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX); + } - return 0; /* keep compiler quiet */ + return 0; } /* diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index c746759eef..7d721755a2 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -5995,6 +5995,10 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata, rte = root->simple_rte_array[rel->relid]; Assert(rte->rtekind == RTE_RELATION); + /* ignore partitioned tables. Any indexes here are not real indexes */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + return false; + /* Search through the indexes to see if any match our problem */ foreach(lc, rel->indexlist) { diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index 03926a8413..c854228482 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -4852,46 +4852,46 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2 (8 rows) -- partitionwise join with fractional paths -CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_t (id BIGINT, c INT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); -- insert data -INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); +INSERT INTO fract_t (id,c) SELECT i,i FROM generate_series(0, 1999) i; ANALYZE fract_t; --- verify plan; nested index only scans +-- verify plan; nested index scans SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; EXPLAIN (COSTS OFF) SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; - QUERY PLAN ------------------------------------------------------------------------ + QUERY PLAN +------------------------------------------------------------------ Limit -> Merge Append Sort Key: x.id -> Merge Left Join Merge Cond: (x_1.id = y_1.id) - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 - -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 + -> Index Scan using fract_t0_pkey on fract_t0 x_1 + -> Index Scan using fract_t0_pkey on fract_t0 y_1 -> Merge Left Join Merge Cond: (x_2.id = y_2.id) - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 - -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 + -> Index Scan using fract_t1_pkey on fract_t1 x_2 + -> Index Scan using fract_t1_pkey on fract_t1 y_2 (11 rows) EXPLAIN (COSTS OFF) SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; - QUERY PLAN --------------------------------------------------------------------------------- + QUERY PLAN +--------------------------------------------------------------------------- Limit -> Merge Append Sort Key: x.id DESC -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 - -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 + -> Index Scan Backward using fract_t0_pkey on fract_t0 x_1 + -> Index Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id = x_1.id) -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 - -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 + -> Index Scan Backward using fract_t1_pkey on fract_t1 x_2 + -> Index Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id = x_2.id) (11 rows) diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql index 67f506361f..009184d348 100644 --- a/src/test/regress/sql/partition_join.sql +++ b/src/test/regress/sql/partition_join.sql @@ -1144,15 +1144,15 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2 SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE ((t1.b >= 100 AND t1.b < 110) OR (t1.b >= 200 AND t1.b < 210)) AND ((t2.b >= 100 AND t2.b < 110) OR (t2.b >= 200 AND t2.b < 210)) AND t1.c IN ('0004', '0009') ORDER BY t1.a, t1.b; -- partitionwise join with fractional paths -CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_t (id BIGINT, c INT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); -- insert data -INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); +INSERT INTO fract_t (id,c) SELECT i,i FROM generate_series(0, 1999) i; ANALYZE fract_t; --- verify plan; nested index only scans +-- verify plan; nested index scans SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on;