Hi,
I came up with a slightly more intuitive way to force the different outcome for the fractional test, that does hardly change anything. I'm not sure, whether the differentiation between fract_x and fract_t is worth it, since there shouldn't be any difference, but as mentioned before I added it for potential future clarity. Thanks for your feedback again! Regards Arne ________________________________ From: Arne Roland Sent: Wednesday, January 19, 2022 10:13:55 PM To: Alvaro Herrera; Julien Rouhaud Cc: pgsql-hackers Subject: Re: missing indexes in indexlist with partitioned tables Hi! > From: Alvaro Herrera <alvhe...@alvh.no-ip.org> > [...] > Hmm, these plan changes look valid to me. A left self-join using the > primary key column? That looks optimizable all right. > [...] > What I still don't know is whether this patch is actually desirable or > not. If the only cases it affects is self-joins, is there much actual > value? This is not really about self joins. That was just the most simple example, because otherwise we need a second table. It's unique, it's not relevant whether it's the same table. In most cases it won't. I was talking about join pruning. > I suspect that the author of partition-wise joins would want to change > these queries so that whatever was being tested by these self-joins is > tested by some other means (maybe just create an identical partitioned > table via CREATE TABLE fract_t2 ... ; INSERT INTO fract_t2 SELECT FROM > fract_t). But at the same time, the author of this patch should Your suggestion doesn't work, because with my patch we solve that case as well. We solve the general join pruning case. If we make the index non-unique however, we should be able to test the fractional case the same way. > b) add some test cases to verify that his desired > behavior is tested somewhere, not just in a test case that's > incidentally broken by his patch. My patch already includes such a test, look at @@ -90,6 +90,13 @@ src/test/regress/sql/partition_join.sql Since the selfjoin part was confusing to you, it might be worthwhile to do test that with two different tables. While I see no need to test in that way, I will adjust the patch so. Just to make it more clear for people looking at those tests in the future. a) make > sure that the submitted patch updates these test results so that the > test pass [...] Just for the record: I did run the tests, but I did miss that the commit of Tomas fix for fractional optimization is already on the master. Please note that this is a very new test case from a patch committed less than one week ago. I'm glad Julien Rouhaud pointed out, that Tomas patch is committed it by now. That was very helpful to me, as I can now integrate the two tests. @Álvaro Herrera: If you want to help me, please don't put forward an abstract list of responsibilities. If anything I obviously need practical help, on how I can catch on recent changes quicker and without manual intervention. I don't have a modified buildfarm animal running, that tries to apply my patch and run regression tests for my patch on a daily basis. Is there a simple way for me to check for that? I will probably integrate those two tests, since they can work of similar structures without need to recreate the tables again and again. I have clear understanding how that new test works. I have to attend a few calls now, but I should be able to update the tests later. Regards Arne
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0ef70ad7f1..fe50919637 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3504,7 +3504,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, Assert(list_length(exprlist) == list_length(oprlist)); /* Short-circuit if no indexes... */ - if (rel->indexlist == NIL) + if (rel->indexlist == NIL && rel->partIndexlist == NIL) return false; /* @@ -3549,7 +3549,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, return false; /* Examine each index of the relation ... */ - foreach(ic, rel->indexlist) + foreach(ic, list_concat(rel->indexlist, rel->partIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..b04b3f88ad 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -23,6 +23,7 @@ #include "postgres.h" #include "nodes/nodeFuncs.h" +#include "nodes/nodes.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" #include "optimizer/optimizer.h" @@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) */ ListCell *lc; - foreach(lc, rel->indexlist) + foreach(lc, list_concat(rel->indexlist, rel->partIndexlist)) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index a5002ad895..db6c3f2912 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -116,8 +116,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, { Index varno = rel->relid; Relation relation; - bool hasindex; List *indexinfos = NIL; + List *partIndexinfos = NIL; /* * We need not lock the relation since it was already locked, either by @@ -154,17 +154,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Retrieve the parallel_workers reloption, or -1 if not set. */ rel->rel_parallel_workers = RelationGetParallelWorkers(relation, -1); - /* - * Make list of indexes. Ignore indexes on system catalogs if told to. - * Don't bother with indexes for an inheritance parent, either. - */ - if (inhparent || - (IgnoreSystemIndexes && IsSystemRelation(relation))) - hasindex = false; - else - hasindex = relation->rd_rel->relhasindex; - - if (hasindex) + /* Make list of indexes. Ignore indexes on system catalogs if told to. */ + if (!(IgnoreSystemIndexes && IsSystemRelation(relation)) && relation->rd_rel->relhasindex) { List *indexoidlist; LOCKMODE lmode; @@ -213,10 +204,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } /* - * Ignore partitioned indexes, since they are not usable for - * queries. + * Don't add partitioned indexes to the indexlist, since they are + * not usable by the executor. If they are unique add them to the + * partindexlist instead, to use for further pruning. If they + * aren't that either, simply skip them. */ - if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + if (inhparent && (!index->indisunique || indexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)) { index_close(indexRelation, NoLock); continue; @@ -264,7 +257,40 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->indexcollations[i] = indexRelation->rd_indcollation[i]; } - info->relam = indexRelation->rd_rel->relam; + /* + * Fetch the index expressions and predicate, if any. We must + * modify the copies we obtain from the relcache to have the + * correct varno for the parent relation, so that they match up + * correctly against qual clauses. + */ + info->indexprs = RelationGetIndexExpressions(indexRelation); + info->indpred = RelationGetIndexPredicate(indexRelation); + if (info->indexprs && varno != 1) + ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); + if (info->indpred && varno != 1) + ChangeVarNodes((Node *) info->indpred, 1, varno, 0); + + info->unique = index->indisunique; + info->immediate = index->indimmediate; + + /* + * Don't add partitioned indexes to the indexlist, add them to the + * partindexlist instead, since they are not usable by the + * executor. + */ + if (indexRelation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + { + index_close(indexRelation, NoLock); + partIndexinfos = lappend(partIndexinfos, info); + continue; + } + + info->hypothetical = false; + info->indrestrictinfo = NIL; /* set later, in indxpath.c */ + info->predOK = false; /* set later, in indxpath.c */ + + /* Build targetlist using the completed indexprs data */ + info->indextlist = build_index_tlist(root, info, relation); /* We copy just the fields we need, not all of rd_indam */ amroutine = indexRelation->rd_indam; @@ -284,6 +310,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Fetch index opclass options */ info->opclassoptions = RelationGetIndexAttOptions(indexRelation, true); + info->relam = indexRelation->rd_rel->relam; + /* * Fetch the ordering information for the index, if any. */ @@ -370,28 +398,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->nulls_first = NULL; } - /* - * Fetch the index expressions and predicate, if any. We must - * modify the copies we obtain from the relcache to have the - * correct varno for the parent relation, so that they match up - * correctly against qual clauses. - */ - info->indexprs = RelationGetIndexExpressions(indexRelation); - info->indpred = RelationGetIndexPredicate(indexRelation); - if (info->indexprs && varno != 1) - ChangeVarNodes((Node *) info->indexprs, 1, varno, 0); - if (info->indpred && varno != 1) - ChangeVarNodes((Node *) info->indpred, 1, varno, 0); - - /* Build targetlist using the completed indexprs data */ - info->indextlist = build_index_tlist(root, info, relation); - - info->indrestrictinfo = NIL; /* set later, in indxpath.c */ - info->predOK = false; /* set later, in indxpath.c */ - info->unique = index->indisunique; - info->immediate = index->indimmediate; - info->hypothetical = false; - /* * Estimate the index size. If it's not a partial index, we lock * the number-of-tuples estimate to equal the parent table; if it @@ -441,6 +447,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, } rel->indexlist = indexinfos; + rel->partIndexlist = partIndexinfos; rel->statlist = get_relation_statistics(rel, relation); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 1f3845b3fe..4d0ed938ac 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -334,11 +334,11 @@ struct PlannerInfo MemoryContext planner_cxt; /* context holding PlannerInfo */ - Cardinality total_table_pages; /* # of pages in all non-dummy tables of + Cardinality total_table_pages; /* # of pages in all non-dummy tables of * query */ - Selectivity tuple_fraction; /* tuple_fraction passed to query_planner */ - Cardinality limit_tuples; /* limit_tuples passed to query_planner */ + Selectivity tuple_fraction; /* tuple_fraction passed to query_planner */ + Cardinality limit_tuples; /* limit_tuples passed to query_planner */ Index qual_security_level; /* minimum security_level for quals */ /* Note: qual_security_level is zero if there are no securityQuals */ @@ -681,7 +681,7 @@ typedef struct RelOptInfo Relids relids; /* set of base relids (rangetable indexes) */ /* size estimates generated by planner */ - Cardinality rows; /* estimated number of result tuples */ + Cardinality rows; /* estimated number of result tuples */ /* per-relation planner control flags */ bool consider_startup; /* keep cheap-startup-cost paths? */ @@ -716,9 +716,10 @@ typedef struct RelOptInfo List *lateral_vars; /* LATERAL Vars and PHVs referenced by rel */ Relids lateral_referencers; /* rels that reference me laterally */ List *indexlist; /* list of IndexOptInfo */ + List *partIndexlist; /* list of IndexOptInfo */ List *statlist; /* list of StatisticExtInfo */ BlockNumber pages; /* size estimates derived from pg_class */ - Cardinality tuples; + Cardinality tuples; double allvisfrac; Bitmapset *eclass_indexes; /* Indexes in PlannerInfo's eq_classes list of * ECs that mention this rel */ @@ -841,7 +842,7 @@ struct IndexOptInfo /* index-size statistics (from pg_class and elsewhere) */ BlockNumber pages; /* number of disk pages in index */ - Cardinality tuples; /* number of index tuples in index */ + Cardinality tuples; /* number of index tuples in index */ int tree_height; /* index tree height, or -1 if unknown */ /* index descriptor information */ @@ -1140,7 +1141,7 @@ typedef struct ParamPathInfo NodeTag type; Relids ppi_req_outer; /* rels supplying parameters used by path */ - Cardinality ppi_rows; /* estimated number of result tuples */ + Cardinality ppi_rows; /* estimated number of result tuples */ List *ppi_clauses; /* join clauses available from outer rels */ } ParamPathInfo; @@ -1190,7 +1191,7 @@ typedef struct Path int parallel_workers; /* desired # of workers; 0 = not parallel */ /* estimated size/costs for path (see costsize.c for more info) */ - Cardinality rows; /* estimated number of result tuples */ + Cardinality rows; /* estimated number of result tuples */ Cost startup_cost; /* cost expended before fetching any tuples */ Cost total_cost; /* total cost (assuming all tuples fetched) */ @@ -1453,7 +1454,7 @@ typedef struct AppendPath List *subpaths; /* list of component Paths */ /* Index of first partial path in subpaths; list_length(subpaths) if none */ int first_partial_path; - Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ + Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ } AppendPath; #define IS_DUMMY_APPEND(p) \ @@ -1475,7 +1476,7 @@ typedef struct MergeAppendPath { Path path; List *subpaths; /* list of component Paths */ - Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ + Cardinality limit_tuples; /* hard limit on output tuples, or -1 */ } MergeAppendPath; /* @@ -1773,7 +1774,7 @@ typedef struct AggPath Path *subpath; /* path representing input source */ AggStrategy aggstrategy; /* basic strategy, see nodes.h */ AggSplit aggsplit; /* agg-splitting mode, see nodes.h */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ uint64 transitionSpace; /* for pass-by-ref transition data */ List *groupClause; /* a list of SortGroupClause's */ List *qual; /* quals (HAVING quals), if any */ @@ -1787,7 +1788,7 @@ typedef struct GroupingSetData { NodeTag type; List *set; /* grouping set as list of sortgrouprefs */ - Cardinality numGroups; /* est. number of result groups */ + Cardinality numGroups; /* est. number of result groups */ } GroupingSetData; typedef struct RollupData @@ -1796,7 +1797,7 @@ typedef struct RollupData List *groupClause; /* applicable subset of parse->groupClause */ List *gsets; /* lists of integer indexes into groupClause */ List *gsets_data; /* list of GroupingSetData */ - Cardinality numGroups; /* est. number of result groups */ + Cardinality numGroups; /* est. number of result groups */ bool hashable; /* can be hashed */ bool is_hashed; /* to be implemented as a hashagg */ } RollupData; @@ -1847,7 +1848,7 @@ typedef struct SetOpPath List *distinctList; /* SortGroupClauses identifying target cols */ AttrNumber flagColIdx; /* where is the flag column, if any */ int firstFlag; /* flag value for first input relation */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ } SetOpPath; /* @@ -1860,7 +1861,7 @@ typedef struct RecursiveUnionPath Path *rightpath; List *distinctList; /* SortGroupClauses identifying target cols */ int wtParam; /* ID of Param representing work table */ - Cardinality numGroups; /* estimated number of groups in input */ + Cardinality numGroups; /* estimated number of groups in input */ } RecursiveUnionPath; /* @@ -2616,7 +2617,7 @@ typedef struct typedef struct { bool limit_needed; - Cardinality limit_tuples; + Cardinality limit_tuples; int64 count_est; int64 offset_est; } FinalPathExtraData; @@ -2647,15 +2648,15 @@ typedef struct JoinCostWorkspace Cost inner_rescan_run_cost; /* private for cost_mergejoin code */ - Cardinality outer_rows; - Cardinality inner_rows; - Cardinality outer_skip_rows; - Cardinality inner_skip_rows; + Cardinality outer_rows; + Cardinality inner_rows; + Cardinality outer_skip_rows; + Cardinality inner_skip_rows; /* private for cost_hashjoin code */ int numbuckets; int numbatches; - Cardinality inner_rows_total; + Cardinality inner_rows_total; } JoinCostWorkspace; /* diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index bb5b7c47a4..562bad6544 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -4866,14 +4866,42 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2 CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; --- verify plan; nested index only scans +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; +-- verify partition pruning SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + -> Append + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 +(4 rows) + +-- verify plan; nested index only scans +EXPLAIN (COSTS OFF) +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------- Limit @@ -4881,32 +4909,33 @@ SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; Sort Key: x.id -> Merge Left Join Merge Cond: (x_1.id = y_1.id) - -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 -> Merge Left Join Merge Cond: (x_2.id = y_2.id) - -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 (11 rows) EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------- Limit -> Merge Append Sort Key: x.id DESC -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t0_pkey on fract_t0 x_1 + -> Index Only Scan Backward using fract_x0_pkey on fract_x0 x_1 -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 Index Cond: (id = x_1.id) -> Nested Loop Left Join - -> Index Only Scan Backward using fract_t1_pkey on fract_t1 x_2 + -> Index Only Scan Backward using fract_x1_pkey on fract_x1 x_2 -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 Index Cond: (id = x_2.id) (11 rows) -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join; diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql index 67f506361f..39e23911d0 100644 --- a/src/test/regress/sql/partition_join.sql +++ b/src/test/regress/sql/partition_join.sql @@ -1148,22 +1148,38 @@ CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000'); CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000'); +CREATE TABLE fract_x (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); +CREATE TABLE fract_x0 PARTITION OF fract_x FOR VALUES FROM ('0') TO ('1000'); +CREATE TABLE fract_x1 PARTITION OF fract_x FOR VALUES FROM ('1000') TO ('2000'); -- insert data INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); ANALYZE fract_t; +INSERT INTO fract_x (id) (SELECT generate_series(0, 1999)); +ANALYZE fract_x; --- verify plan; nested index only scans SET max_parallel_workers_per_gather = 0; SET enable_partitionwise_join = on; +-- verify partition pruning +SET max_parallel_workers_per_gather = 0; +SET enable_partitionwise_join = on; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +EXPLAIN (COSTS OFF) +SELECT x.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; + +-- verify plan; nested index only scans EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id ASC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id ASC LIMIT 10; EXPLAIN (COSTS OFF) -SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER BY id DESC LIMIT 10; +SELECT x.id, y.id FROM fract_x x LEFT JOIN fract_t y USING (id) ORDER BY x.id DESC LIMIT 10; -- cleanup DROP TABLE fract_t; +DROP TABLE fract_x; RESET max_parallel_workers_per_gather; RESET enable_partitionwise_join;