Hi! From: Amit Langote <amitlangot...@gmail.com> Sent: Tuesday, January 25, 2022 09:04 Subject: Re: missing indexes in indexlist with partitioned tables > [...] > "partindexlist" really made me think about a list of "partial indexes" > for some reason. I think maybe "partedindexlist" is what you are > looking for; "parted" is commonly used as short for "partitioned" when > naming variables. > > The comment only mentions "further pruning" as to what partitioned > indexes are to be remembered in RelOptInfo, but it's not clear what > that means. It may help to be more specific.
Thanks for the feedback! I've changed that. The current version is attached. > Finally, I don't understand why we need a separate field to store > indexes found in partitioned base relations. AFAICS, nothing but the > sites you are interested in (relation_has_unique_index_for() and > rel_supports_distinctness()) would ever bother to look at a > partitioned base relation's indexlist. Do you think putting them into > in indexlist might break something? I have thought about that before. AFAICT there is nothing in core, which breaks. However I am not sure, I want to mix those two kinds of index nodes. First of all the structure is different, partedIndexes don't have physical attributes after all. This is technical implementation detail relating to the current promise, that entries of the indexlist are indexes we can use. And by use, I mean use for statistics or the executor. I'm more concerned about future changes regarding the order and optimization of processing harder here. The order in which we do things in the planner is a bit messy, and I wouldn't mind seeing details about that change. Looking at the current wacky order in the optimizer, I'm not convinced, that nothing will want to have a look at the indexlist, before partitioned tables are unpacked. Since it would be easy to introduce this new variable later, wouldn't mind adding it to the indexlist directly for now. But changing the underlying promise of what it contains, seems noteworthy and more intrusive to me. > > Side note: I personally think the name inhparent is mildly confusing, since > > it's not really about inheritance. I don't have a significantly better idea > > though. > > Partitioned tables are "inheritance parent", so share the same code as > what traditional inheritance parents have always used for planning. I recall that manual partitioning via inheritance, that was cumbersome. Though that minor historical detail was not, what I was referring to. There are a lot of other cases, that cause us to set inhparent. IIRC we use this flag in some ddl commands, which have nothing to do with inheritance. It essentially is used as a variant to skip the indexlist creation. If such hacks weren't there, we could simply check for the relkind and indisunique. Regards Arne
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0ef70ad7f1..5225076df3 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3504,7 +3504,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, Assert(list_length(exprlist) == list_length(oprlist)); /* Short-circuit if no indexes... */ - if (rel->indexlist == NIL) + if (rel->indexlist == NIL && rel->partedIndexlist == NIL) return false; /* @@ -3549,7 +3549,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, return false; /* Examine each index of the relation ... */ - foreach(ic, rel->indexlist) + foreach(ic, list_concat(rel->indexlist, rel->partedIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..16ce443ec9 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -23,6 +23,7 @@ #include "postgres.h" #include "nodes/nodeFuncs.h" +#include "nodes/nodes.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" #include "optimizer/optimizer.h" @@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) */ ListCell *lc; - foreach(lc, rel->indexlist) + foreach(lc, list_concat(rel->indexlist, rel->partedIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index a5002ad895..2451b2ae79 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -116,8 +116,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, { Index varno = rel->relid; Relation relation; - bool hasindex; List *indexinfos = NIL; + List *partedIndexinfos = NIL; /* * We need not lock the relation since it was already locked, either by @@ -154,17 +154,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Retrieve the parallel_workers reloption, or -1 if not set. */ rel->rel_parallel_workers = RelationGetParallelWorkers(relation, -1); - /* - * Make list of indexes. Ignore indexes on system catalogs if told to. - * Don't bother with indexes for an inheritance parent, either. - */ - if (inhparent || - (IgnoreSystemIndexes && IsSystemRelation(relation))) - hasindex = false; - else - hasindex = relation->rd_rel->relhasindex; - - if (hasindex) + /* Make list of indexes. Ignore indexes on system catalogs if told to. */ + if (!(IgnoreSystemIndexes && IsSystemRelation(relation)) && relation->rd_rel->relhasindex) { List *indexoidlist; LOCKMODE lmode; @@ -213,10 +204,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } /* - * Ignore partitioned indexes, since they are not usable for - * queries. + * Don't add partitioned indexes to the indexlist, since they are + * not usable by the executor. If they are unique add them to the + * partedIndexlist instead, to use for further pruning. That is + * relevant for the join pruning, if the outer relation is partitioned. + * If they aren't that either, simply skip them. */ - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + if (inhparent && (!index->indisunique || indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)) { index_close(indexRelation, NoLock); continue; @@ -264,7 +258,40 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->indexcollations[i] = indexRelation->rd_indcollation[i]; } - info->relam = indexRelation->rd_rel->relam; + /* + * Fetch the index expressions and predicate, if any. We must + * modify the copies we obtain from the relcache to have the + * correct varno for the parent relation, so that they match up + * correctly against qual clauses. + */ + info->indexprs = RelationGetIndexExpressions(indexRelation); + info->indpred = RelationGetIndexPredicate(indexRelation); + if (info->indexprs && varno != 1) + ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); + if (info->indpred && varno != 1) + ChangeVarNodes((Node *) info->indpred, 1, varno, 0); + + info->unique = index->indisunique; + info->immediate = index->indimmediate; + + /* + * Don't add partitioned indexes to the indexlist, add them to the + * partedIndexlist instead, since they are not usable by the + * executor. + */ + if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + { + index_close(indexRelation, NoLock); + partedIndexinfos = lappend(partedIndexinfos, info); + continue; + } + + info->hypothetical = false; + info->indrestrictinfo = NIL; /* set later, in indxpath.c */ + info->predOK = false; /* set later, in indxpath.c */ + + /* Build targetlist using the completed indexprs data */ + info->indextlist = build_index_tlist(root, info, relation); /* We copy just the fields we need, not all of rd_indam */ amroutine = indexRelation->rd_indam; @@ -284,6 +311,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Fetch index opclass options */ info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true); + info->relam = indexRelation->rd_rel->relam; + /* * Fetch the ordering information for the index, if any. */ @@ -370,28 +399,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->nulls_first = NULL; } - /* - * Fetch the index expressions and predicate, if any. We must - * modify the copies we obtain from the relcache to have the - * correct varno for the parent relation, so that they match up - * correctly against qual clauses. - */ - info->indexprs = RelationGetIndexExpressions(indexRelation); - info->indpred = RelationGetIndexPredicate(indexRelation); - if (info->indexprs && varno != 1) - ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); - if (info->indpred && varno != 1) - ChangeVarNodes((Node *) info->indpred, 1, varno, 0); - - /* Build targetlist using the completed indexprs data */ - info->indextlist = build_index_tlist(root, info, relation); - - info->indrestrictinfo = NIL; /* set later, in indxpath.c */ - info->predOK = false; /* set later, in indxpath.c */ - info->unique = index->indisunique; - info->immediate = index->indimmediate; - info->hypothetical = false; - /* * Estimate the index size. If it's not a partial index, we lock * the number-of-tuples estimate to equal the parent table; if it @@ -441,6 +448,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } rel->indexlist = indexinfos; + rel->partedIndexlist = partedIndexinfos; rel->statlist = get_relation_statistics(rel, relation); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 1f3845b3fe..8a46e0ca70 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -334,11 +334,11 @@ struct PlannerInfo MemoryContext planner_cxt; /* context holding PlannerInfo */ - Cardinality total_table_pages; /* # of pages in all non-dummy tables of + Cardinality total_table_pages; /* # of pages in all non-dummy tables of * query */ - Selectivity tuple_fraction; /* tuple_fraction passed to query_planner */ - Cardinality limit_tuples; /* limit_tuples passed to query_planner */ + Selectivity tuple_fraction; /* tuple_fraction passed to query_planner */ + Cardinality limit_tuples; /* limit_tuples passed to query_planner */ Index qual_security_level; /* minimum security_level for quals */ /* Note: qual_security_level is zero if there are no securityQuals */ @@ -681,7 +681,7 @@ typedef struct RelOptInfo Relids relids; /* set of base relids (rangetable indexes) */ /* size estimates generated by planner */ - Cardinality rows; /* estimated number of result tuples */ + Cardinality rows; /* estimated number of result tuples */ /* per-relation planner control flags */ bool consider_startup; /* keep cheap-startup-cost paths? */ @@ -716,9 +716,10 @@ typedef struct RelOptInfo List *lateral_vars; /* LATERAL Vars and PHVs referenced by rel */ Relids lateral_referencers; /* rels that reference me laterally */ List *indexlist; /* list of IndexOptInfo */ + List *partedIndexlist; /* list of IndexOptInfo */ List *statlist; /* list of StatisticExtInfo */ BlockNumber pages; /* size estimates derived from pg_class */ - Cardinality tuples; + Cardinality tuples; double allvisfrac; Bitmapset *eclass_indexes; /* Indexes in PlannerInfo's eq_classes list of * ECs that mention this rel */ @@ -841,7 +842,7 @@ struct IndexOptInfo /* index-size statistics (from pg_class and elsewhere) */ BlockNumber pages; /* number of disk pages in index */ - Cardinality tuples; /* number of index tuples in index */ + Cardinality tuples; /* number of index tuples in index */ int tree_height; /* index tree height, or -1 if unknown */ /* index descriptor information */ @@ -1140,7 +1141,7 @@ typedef struct ParamPathInfo NodeTag type; Relids ppi_req_outer; /* rels supplying parameters used by path */ - Cardinality ppi_rows; /* estimated number of result tuples */ + Cardinality ppi_rows; /* estimated number of result tuples */ List *ppi_clauses; /* join clauses available from outer rels */ } ParamPathInfo; @@ -1190,7 +1191,7 @@ typedef struct Path int parallel_workers; /* desired # of workers; 0 = not parallel */ /* estimated size/costs for path (see costsize.c for more info) */ - Cardinality rows; /* estimated number of result tuples */ + Cardinality rows; /* estimated number of result tuples */ Cost startup_cost; /* cost expended before fetching any tuples */ Cost total_cost; /* total cost (assuming all tuples fetched) */ @@ -1453,7 +1454,7 @@ typedef struct AppendPath List *subpaths; /* list of component Paths */ /* Index of first partial path in subpaths; list_length(subpaths) if none */ int first_partial_path; - Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ + Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ } AppendPath; #define IS_DUMMY_APPEND(p) \ @@ -1475,7 +1476,7 @@ typedef struct MergeAppendPath { Path path; List *subpaths; /* list of component Paths */ - Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ + Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ } MergeAppendPath; /* @@ -1773,7 +1774,7 @@ typedef struct AggPath Path *subpath; /* path representing input source */ AggStrategy aggstrategy; /* basic strategy, see nodes.h */ AggSplit aggsplit; /* agg-splitting mode, see nodes.h */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ uint64 transitionSpace; /* for pass-by-ref transition data */ List *groupClause; /* a list of SortGroupClause's */ List *qual; /* quals (HAVING quals), if any */ @@ -1787,7 +1788,7 @@ typedef struct GroupingSetData { NodeTag type; List *set; /* grouping set as list of sortgrouprefs */ - Cardinality numGroups; /* est. number of result groups */ + Cardinality numGroups; /* est. number of result groups */ } GroupingSetData; typedef struct RollupData @@ -1796,7 +1797,7 @@ typedef struct RollupData List *groupClause; /* applicable subset of parse->groupClause */ List *gsets; /* lists of integer indexes into groupClause */ List *gsets_data; /* list of GroupingSetData */ - Cardinality numGroups; /* est. number of result groups */ + Cardinality numGroups; /* est. number of result groups */ bool hashable; /* can be hashed */ bool is_hashed; /* to be implemented as a hashagg */ } RollupData; @@ -1847,7 +1848,7 @@ typedef struct SetOpPath List *distinctList; /* SortGroupClauses identifying target cols */ AttrNumber flagColIdx; /* where is the flag column, if any */ int firstFlag; /* flag value for first input relation */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ } SetOpPath; /* @@ -1860,7 +1861,7 @@ typedef struct RecursiveUnionPath Path *rightpath; List *distinctList; /* SortGroupClauses identifying target cols */ int wtParam; /* ID of Param representing work table */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ } RecursiveUnionPath; /* @@ -2616,7 +2617,7 @@ typedef struct typedef struct { bool limit_needed; - Cardinality limit_tuples; + Cardinality limit_tuples; int64 count_est; int64 offset_est; } FinalPathExtraData; @@ -2647,15 +2648,15 @@ typedef struct JoinCostWorkspace Cost inner_rescan_run_cost; /* private for cost_mergejoin code */ - Cardinality outer_rows; - Cardinality inner_rows; - Cardinality outer_skip_rows; - Cardinality inner_skip_rows; + Cardinality outer_rows; + Cardinality inner_rows; + Cardinality outer_skip_rows; + Cardinality inner_skip_rows; /* private for cost_hashjoin code */ int numbuckets; int numbatches; - Cardinality inner_rows_total; + Cardinality inner_rows_total; } JoinCostWorkspace; /* diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index bb5b7c47a4..562bad6544 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -4866,14 +4866,42 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2 CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; --- verify plan; nested index only scans +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; +-- verify partition pruning SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +-- verify plan; nested index only scans +EXPLAIN (COSTS OFF) +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------- Limit @@ -4881,32 +4909,33 @@ SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; Sort Key: x.id -> Merge Left Join Merge Cond: (x_1.id = y_1.id) - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 -> Merge Left Join Merge Cond: (x_2.id = y_2.id) - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 (11 rows) EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------- Limit -> Merge Append Sort Key: x.id DESC -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan Backward using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id = x_1.id) -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan Backward using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id = x_2.id) (11 rows) -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join; diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql index 67f506361f..39e23911d0 100644 --- a/src/test/regress/sql/partition_join.sql +++ b/src/test/regress/sql/partition_join.sql @@ -1148,22 +1148,38 @@ CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; --- verify plan; nested index only scans SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; +-- verify partition pruning +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +-- verify plan; nested index only scans EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join;