Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru> writes: > Here is a rebased version with some bugfixes.
I noticed this had bit-rotted again. I've not really reviewed it, but I rebased it up to HEAD, and fixed a couple small things: * My compiler was bitching about misplaced declarations, so I moved some variable declarations accordingly. I couldn't help noticing that many of those wouldn't have been a problem in the first place if you were following project style for loops around list_delete_cell calls, which usually look more like this: prev = NULL; for (cell = list_head(root->rowMarks); cell; cell = next) { PlanRowMark *rc = (PlanRowMark *) lfirst(cell); next = lnext(cell); if (rt_fetch(rc->rti, root->parse->rtable)->rtekind == RTE_RESULT) root->rowMarks = list_delete_cell(root->rowMarks, cell, prev); else prev = cell; } * I saw you had a problem with an existing test in join.sql that was being optimized away because it used an ill-advised self-join. I've pushed a fix for that, so it's not a problem as of HEAD. I notice though that there's one unexplained plan change remaining in join.out: @@ -4365,11 +4365,13 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k where p.k = 1 and p.k = 2; - QUERY PLAN --------------------------- + QUERY PLAN +------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows) -- bug 5255: this is not optimizable by join removal begin; That sure looks like a bug. I don't have time to look for the cause right now. I also noticed that the test results show that when a table is successfully optimized away, the remaining reference seems to have the alias of the second reference not the first one. That seems a little ... weird. It's just cosmetic of course, but why is that? Also, I did notice that you'd stuck a declaration for "struct UniqueIndexInfo" into paths.h, which then compelled you to include that header in planmain.h. This seems like poor style; I'd have been inclined to put the struct in pathnodes.h instead. That's assuming you need it at all -- in those two usages, seems like it'd be just about as easy to return two separate Lists. On the other hand, given + * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids + * is a set of other rels for which this one has been proven + * unique, and UniqueIndexInfo* stores information about the + * index that makes it unique, if any. I wonder why you didn't include the Relids into UniqueIndexInfo as well ... and maybe make it a proper Node so that unique_for_rels could be printed by outfuncs.c. So any way I slice it, it seems like this data structure could use more careful contemplation. Anyway, updated patch attached. regards, tom lane
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 3434219..86c9453 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel, * relation_has_unique_index_for * Determine whether the relation provably has at most one row satisfying * a set of equality conditions, because the conditions constrain all - * columns of some unique index. + * columns of some unique index. If index_info is not null, it is set to + * point to a new UniqueIndexInfo containing the index and conditions. * * The conditions can be represented in either or both of two ways: * 1. A list of RestrictInfo nodes, where the caller has already determined @@ -3604,7 +3605,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel, bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, - List *exprlist, List *oprlist) + List *exprlist, List *oprlist, + UniqueIndexInfo **index_info) { ListCell *ic; @@ -3660,6 +3662,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic); int c; + List *matched_restrictlist = NIL; /* * If the index is not unique, or not immediately enforced, or if it's @@ -3708,6 +3711,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, if (match_index_to_operand(rexpr, c, ind)) { matched = true; /* column is unique */ + matched_restrictlist = lappend(matched_restrictlist, rinfo); break; } } @@ -3750,7 +3754,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, /* Matched all key columns of this index? */ if (c == ind->nkeycolumns) + { + if (index_info != NULL) + { + /* This may be called in GEQO memory context. */ + MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt); + *index_info = palloc(sizeof(UniqueIndexInfo)); + (*index_info)->index = ind; + (*index_info)->clauses = list_copy(matched_restrictlist); + MemoryContextSwitchTo(oldContext); + } + if (matched_restrictlist) + list_free(matched_restrictlist); return true; + } + if (matched_restrictlist) + list_free(matched_restrictlist); } return false; diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index d8ff4bf..89cd236 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root, innerrel, JOIN_INNER, restrictlist, - false); + false, + NULL /*index_info*/); break; default: extra.inner_unique = innerrel_is_unique(root, @@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root, innerrel, jointype, restrictlist, - false); + false, + NULL /*index_info*/); break; } diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index a4efa69..19e5139 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -22,6 +22,7 @@ */ #include "postgres.h" +#include "catalog/pg_class.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" @@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid, static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel); static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, - List *clause_list); + List *clause_list, UniqueIndexInfo **info); static Oid distinct_col_search(int colno, List *colnos, List *opids); static bool is_innerrel_unique_for(PlannerInfo *root, Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel, JoinType jointype, - List *restrictlist); + List *restrictlist, + UniqueIndexInfo **info); /* @@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root, * data structures that have to be updated are accessible via "root". */ List * -remove_useless_joins(PlannerInfo *root, List *joinlist) +remove_useless_left_joins(PlannerInfo *root, List *joinlist) { ListCell *lc; @@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) int innerrelid; RelOptInfo *innerrel; Relids joinrelids; - List *clause_list = NIL; ListCell *l; int attroff; @@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) } /* - * Search for mergejoinable clauses that constrain the inner rel against - * either the outer rel or a pseudoconstant. If an operator is - * mergejoinable then it behaves like equality for some btree opclass, so - * it's what we want. The mergejoinability test also eliminates clauses - * containing volatile functions, which we couldn't depend on. + * Check for pushed-down clauses referencing the inner rel. If there is + * such a clause then join removal has to be disallowed. We have to + * check this despite the previous attr_needed checks because of the + * possibility of pushed-down clauses referencing the rel. */ foreach(l, innerrel->joininfo) { RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l); - - /* - * If it's not a join clause for this outer join, we can't use it. - * Note that if the clause is pushed-down, then it is logically from - * above the outer join, even if it references no other rels (it might - * be from WHERE, for example). - */ - if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)) - { - /* - * If such a clause actually references the inner rel then join - * removal has to be disallowed. We have to check this despite - * the previous attr_needed checks because of the possibility of - * pushed-down clauses referencing the rel. - */ - if (bms_is_member(innerrelid, restrictinfo->clause_relids)) + if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids) + && bms_is_member(innerrel->relid, restrictinfo->clause_relids)) return false; - continue; /* else, ignore; not useful here */ - } - - /* Ignore if it's not a mergejoinable clause */ - if (!restrictinfo->can_join || - restrictinfo->mergeopfamilies == NIL) - continue; /* not mergejoinable */ - - /* - * Check if clause has the form "outer op inner" or "inner op outer", - * and if so mark which side is inner. - */ - if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand, - innerrel->relids)) - continue; /* no good for these input relations */ - - /* OK, add to list */ - clause_list = lappend(clause_list, restrictinfo); } - /* - * Now that we have the relevant equality join clauses, try to prove the - * innerrel distinct. - */ - if (rel_is_distinct_for(root, innerrel, clause_list)) - return true; - - /* - * Some day it would be nice to check for other methods of establishing - * distinctness. - */ - return false; + return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand, + innerrel, sjinfo->jointype, innerrel->joininfo, + NULL /*unique_index*/); } - /* * Remove the target relid from the planner's data structures, having * determined that there is no need to include it in the query. @@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root) /* Test whether the innerrel is unique for those clauses. */ if (!innerrel_is_unique(root, joinrelids, sjinfo->min_lefthand, innerrel, - JOIN_SEMI, restrictlist, true)) + JOIN_SEMI, restrictlist, true, NULL /*index_info*/)) continue; /* OK, remove the SpecialJoinInfo from the list. */ @@ -643,9 +601,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) * Note that the passed-in clause_list may be destructively modified! This * is OK for current uses, because the clause_list is built by the caller for * the sole purpose of passing to this function. + * + * If unique_index is not null, it is set to point to the index that guarantees + * uniqueness for a base relation. */ static bool -rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list) +rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list, + UniqueIndexInfo **index_info) { /* * We could skip a couple of tests here if we assume all callers checked @@ -661,8 +623,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list) * relation_has_unique_index_for automatically adds any usable * restriction clauses for the rel, so we needn't do that here. */ - if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL)) - return true; + return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL, + index_info); } else if (rel->rtekind == RTE_SUBQUERY) { @@ -966,6 +928,10 @@ distinct_col_search(int colno, List *colnos, List *opids) * heuristic about whether to cache negative answers; it should be "true" * if making an inquiry that is not part of the normal bottom-up join search * sequence. + * + * If index_info_out is not null, it is set to point to a new UniqueIndexInfo + * allocated in root memory context, that describes the index that guarantees + * uniqueness. */ bool innerrel_is_unique(PlannerInfo *root, @@ -974,12 +940,23 @@ innerrel_is_unique(PlannerInfo *root, RelOptInfo *innerrel, JoinType jointype, List *restrictlist, - bool force_cache) + bool force_cache, + UniqueIndexInfo **index_info_out) { MemoryContext old_context; ListCell *lc; + UniqueIndexInfo *index_info; + + if (index_info_out) + *index_info_out = NULL; - /* Certainly can't prove uniqueness when there are no joinclauses */ + /* + * It is possible to prove uniqueness even in the absence of joinclauses, + * just from baserestrictinfos alone. However, in these cases the inner + * relation returns one row at most, so join removal won't give much + * benefit. It seems better to save some planning time by ignoring these + * cases. + */ if (restrictlist == NIL) return false; @@ -999,10 +976,14 @@ innerrel_is_unique(PlannerInfo *root, */ foreach(lc, innerrel->unique_for_rels) { - Relids unique_for_rels = (Relids) lfirst(lc); + Relids unique_for_rels = (Relids) linitial(lfirst(lc)); if (bms_is_subset(unique_for_rels, outerrelids)) + { + if (index_info_out) + *index_info_out = lsecond(lfirst(lc)); return true; /* Success! */ + } } /* @@ -1019,7 +1000,7 @@ innerrel_is_unique(PlannerInfo *root, /* No cached information, so try to make the proof. */ if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel, - jointype, restrictlist)) + jointype, restrictlist, &index_info)) { /* * Cache the positive result for future probes, being sure to keep it @@ -1033,9 +1014,12 @@ innerrel_is_unique(PlannerInfo *root, */ old_context = MemoryContextSwitchTo(root->planner_cxt); innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, - bms_copy(outerrelids)); + list_make2(bms_copy(outerrelids), index_info)); MemoryContextSwitchTo(old_context); + if (index_info_out) + *index_info_out = index_info; + return true; /* Success! */ } else @@ -1081,7 +1065,8 @@ is_innerrel_unique_for(PlannerInfo *root, Relids outerrelids, RelOptInfo *innerrel, JoinType jointype, - List *restrictlist) + List *restrictlist, + UniqueIndexInfo **index_info) { List *clause_list = NIL; ListCell *lc; @@ -1123,5 +1108,830 @@ is_innerrel_unique_for(PlannerInfo *root, } /* Let rel_is_distinct_for() do the hard work */ - return rel_is_distinct_for(root, innerrel, clause_list); + return rel_is_distinct_for(root, innerrel, clause_list, index_info); +} + +typedef struct +{ + Index oldRelid; + Index newRelid; +} ChangeVarnoContext; + +static bool +change_varno_walker(Node *node, ChangeVarnoContext *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid) + { + ((Var *) node)->varno = context->newRelid; + ((Var *) node)->varnoold = context->newRelid; + return false; + } + + return expression_tree_walker(node, change_varno_walker, context); +} + +/* + * For all Vars in the expression that have varno = oldRelid, set + * varno = newRelid. + */ +static void +change_varno(Expr *expr, Index oldRelid, Index newRelid) +{ + ChangeVarnoContext context; + context.oldRelid = oldRelid; + context.newRelid = newRelid; + change_varno_walker((Node *) expr, &context); +} + +/* + * Substitute newId for oldId in relids. + */ +static void +change_relid(Relids *relids, Index oldId, Index newId) +{ + if (bms_is_member(oldId, *relids)) + *relids = bms_add_member(bms_del_member(*relids, oldId), newId); +} + +/* + * Update EC members to point to the remaining relation instead of the removed + * one, removing duplicates. + */ +static void +update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep) +{ + ListCell *prev = NULL; + ListCell *cell = NULL; + ListCell *next = list_head(ec->ec_members); + + while (next) + { + EquivalenceMember *em; + ListCell *otherCell; + + prev = cell; + cell = next; + next = lnext(next); + em = lfirst(cell); + + if (!bms_is_member(toRemove, em->em_relids)) + continue; + + change_relid(&em->em_relids, toRemove, toKeep); + /* We only process inner joins */ + Assert(em->em_nullable_relids == NULL); + change_varno(em->em_expr, toRemove, toKeep); + + /* + * After we switched the equivalence member to the remaining relation, + * check that it is not the same as the existing member, and if it + * is, delete it. + */ + foreach (otherCell, ec->ec_members) + { + EquivalenceMember *other; + + if (otherCell == cell) + continue; + + other = castNode(EquivalenceMember, lfirst(otherCell)); + if (equal(other->em_expr, em->em_expr)) + { + ec->ec_members = list_delete_cell(ec->ec_members, cell, prev); + cell = prev; + break; + } + } + } +} + +/* + * Update EC sources to point to the remaining relation instead of the + * removed one. + */ +static void +update_ec_sources(List **sources, Index toRemove, Index toKeep) +{ + ListCell *prev = NULL; + ListCell *cell = NULL; + ListCell *next = list_head(*sources); + + while (next) + { + RestrictInfo *rinfo; + ListCell *otherCell; + + prev = cell; + cell = next; + next = lnext(next); + rinfo = castNode(RestrictInfo, lfirst(cell)); + + if (!bms_is_member(toRemove, rinfo->required_relids)) + continue; + + change_varno(rinfo->clause, toRemove, toKeep); + + /* + * After switching the clause to the remaining relation, check it + * for redundancy with existing ones. We don't have to check for + * redundancy with derived clauses, because we've just deleted them. + */ + foreach (otherCell, *sources) + { + RestrictInfo *other; + + if (otherCell == cell) + continue; + + other = castNode(RestrictInfo, lfirst(otherCell)); + if (equal(rinfo->clause, other->clause)) + { + *sources = list_delete_cell(*sources, cell, prev); + cell = prev; + break; + } + } + + if (otherCell == NULL) + { + /* We will keep this RestrictInfo, correct its relids. */ + change_relid(&rinfo->required_relids, toRemove, toKeep); + change_relid(&rinfo->left_relids, toRemove, toKeep); + change_relid(&rinfo->right_relids, toRemove, toKeep); + change_relid(&rinfo->clause_relids, toRemove, toKeep); + } + } +} + +/* + * Scratch space for the unique self join removal code. + */ +typedef struct +{ + PlannerInfo *root; + + /* Temporary array for relation ids. */ + Index *relids; + + /* + * Array of Relids, one for each relation, indexed by relation id. + * Each element is a set of relation ids with which this relation + * has a special join. + */ + Relids *special_join_rels; + + /* Array of row marks indexed by relid. */ + PlanRowMark **row_marks; + + /* Bitmapset for join relids that is used to avoid reallocation. */ + Relids joinrelids; + + /* + * Top-level targetlist of the query. We have to update any references + * it has to the relations we remove. + */ + List *targetlist; +} UsjScratch; + +/* + * Remove a relation after we have proven that it participates only in an + * unneeded unique self join. + * + * The joinclauses list is destructively changed. + */ +static void +remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses, + RelOptInfo *toKeep, RelOptInfo *toRemove) +{ + PlannerInfo *root = scratch->root; + ListCell *cell; + List *toAppend; + + /* + * Transfer join and restriction clauses from the removed relation to the + * remaining one. We change the Vars of the clause to point to the remaining + * relation instead of the removed one. The clauses that require a subset of + * joinrelids become restriction clauses of the remaining relation, and + * others remain join clauses. We append them to baserestrictinfo and + * joininfo respectively, trying not to introduce duplicates. + * + * We also have to process the 'joinclauses' list here, because it contains + * EC-derived join clauses which must become filter clauses. It is not enough + * to just correct the ECs, because the EC-derived restrictions are generated + * before join removal (see generate_base_implied_equalities). + */ + toAppend = list_concat(joinclauses, toRemove->baserestrictinfo); + toAppend = list_concat(toAppend, toRemove->joininfo); + + foreach(cell, toAppend) + { + RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell); + bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids); + List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo; + ListCell *otherCell; + + /* We can't have an EC-derived clause that joins to some third relation */ + Assert( !(is_join_clause && rinfo->parent_ec != NULL) ); + + /* + * Replace the references to the removed relation with references to + * the remaining one. We won't necessarily add this clause, because + * it may be already present in the joininfo or baserestrictinfo. + * Still, we have to switch it to point to the remaining relation. + * This is important for join clauses that reference both relations, + * because they are included in both joininfos. + */ + change_varno(rinfo->clause, toRemove->relid, toKeep->relid); + change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid); + change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid); + change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid); + change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid); + + /* + * Don't add the clause if it is already present in the list, or + * derived from the same equivalence class, or is the same as another + * clause. + */ + foreach (otherCell, *target) + { + RestrictInfo *other = lfirst(otherCell); + if (other == rinfo + || (rinfo->parent_ec != NULL + && other->parent_ec == rinfo->parent_ec) + || equal(rinfo->clause, other->clause)) + { + break; + } + } + if (otherCell != NULL) + continue; + + /* + * If the clause has the form of "X=X", replace it with null test. + */ + if (rinfo->mergeopfamilies) + { + Expr *leftOp = (Expr *) get_leftop(rinfo->clause); + Expr *rightOp = (Expr *) get_rightop(rinfo->clause); + + if (leftOp != NULL && equal(leftOp, rightOp)) + { + NullTest *test = makeNode(NullTest); + test->arg = leftOp; + test->nulltesttype = IS_NOT_NULL; + test->argisrow = false; + test->location = -1; + rinfo->clause = (Expr *) test; + } + } + + *target = lappend(*target, rinfo); + } + + /* + * Transfer the targetlist and attr_needed flags. + */ + Assert(toRemove->reltarget->sortgrouprefs == 0); + + foreach (cell, toRemove->reltarget->exprs) + { + Expr *node = lfirst(cell); + change_varno(node, toRemove->relid, toKeep->relid); + if (!list_member(toKeep->reltarget->exprs, node)) + toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, + node); + } + + for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++) + { + int attno = i - toKeep->min_attr; + toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno], + toRemove->attr_needed[attno]); + } + + /* + * If the removed relation has a row mark, transfer it to the remaining one. + * + * If both rels have row marks, just keep the one corresponding to the + * remaining relation, because we verified earlier that they have the same + * strength. + * + * Also make sure that the scratch->row_marks cache is up to date, because + * we are going to use it for further join removals. + */ + if (scratch->row_marks[toRemove->relid]) + { + PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid]; + PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid]; + if (*markToKeep) + { + Assert((*markToKeep)->markType == (*markToRemove)->markType); + + root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep); + *markToKeep = NULL; + } + else + { + *markToKeep = *markToRemove; + *markToRemove = NULL; + + /* Shouldn't have inheritance children here. */ + Assert((*markToKeep)->rti == (*markToKeep)->prti); + + (*markToKeep)->rti = toKeep->relid; + (*markToKeep)->prti = toKeep->relid; + } + } + + /* + * Likewise replace references in SpecialJoinInfo data structures. + * + * This is relevant in case the join we're deleting is nested inside + * some special joins: the upper joins' relid sets have to be adjusted. + */ + foreach (cell, root->join_info_list) + { + SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell); + + change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid); + change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid); + change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid); + change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid); + } + + /* + * Likewise update references in PlaceHolderVar data structures. + */ + foreach(cell, root->placeholder_list) + { + PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell); + + /* + * We are at an inner join of two base relations. A placeholder + * can't be needed here or evaluated here. + */ + Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids)); + Assert(!bms_is_subset(phinfo->ph_needed, joinrelids)); + + change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid); + change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid); + change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid); + change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid); + } + + /* + * Update the equivalence classes that reference the removed relations. + */ + foreach(cell, root->eq_classes) + { + EquivalenceClass *ec = lfirst(cell); + + if (!bms_is_member(toRemove->relid, ec->ec_relids)) + { + /* + * This EC doesn't reference the removed relation, + * nothing to be done for it. + */ + continue; + } + + /* + * Update the EC members to reference the remaining relation + * instead of the removed one. + */ + update_ec_members(ec, toRemove->relid, toKeep->relid); + change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid); + + /* + * We will now update source and derived clauses of the EC. + * + * Restriction clauses for base relations are already distributed + * to the respective baserestrictinfo lists (see + * generate_implied_equalities). The above code has already + * processed this list, and updated these clauses to reference + * the remaining relation, so we can skip them here based on their + * relids. + * + * Likewise, we have already processed the join clauses that join + * the removed relation to the remaining one. + * + * Finally, there are join clauses that join the removed relation + * to some third relation. We could delete just delete them and + * generate on demand, but sometimes we can't do this because there + * is no suitable equality operator (see the handling of ec_broken). + * In such cases we are going to use the source clauses, so we have + * to correct them too. + * + * Derived clauses can be generated again, so it is simpler to just + * delete them. + */ + list_free(ec->ec_derives); + ec->ec_derives = NULL; + update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid); + } + + /* + * Mark the rel as "dead" to show it is no longer part of the join tree. + * (Removing it from the baserel array altogether seems too risky.) + */ + toRemove->reloptkind = RELOPT_DEADREL; + + /* + * Update references to the removed relation from other baserels. + */ + for (int i = 1; i < root->simple_rel_array_size; i++) + { + RelOptInfo *otherrel = root->simple_rel_array[i]; + int attroff; + + /* no point in processing target rel itself */ + if (i == toRemove->relid) + continue; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (otherrel == NULL) + continue; + + Assert(otherrel->relid == i); /* sanity check on array */ + + /* Update attr_needed arrays. */ + for (attroff = otherrel->max_attr - otherrel->min_attr; + attroff >= 0; attroff--) + { + change_relid(&otherrel->attr_needed[attroff], toRemove->relid, + toKeep->relid); + } + + /* Update lateral references. */ + foreach (cell, otherrel->lateral_vars) + change_varno(lfirst(cell), toRemove->relid, toKeep->relid); + } +} + +/* + * Test whether the relations are joined on the same unique column. + */ +static bool +is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer, + RelOptInfo *inner, List *restrictlist) +{ + UniqueIndexInfo *outeridx = NULL; + UniqueIndexInfo *inneridx = NULL; + ListCell *outerCell, *innerCell; + + innerrel_is_unique(root, joinrelids, inner->relids, + outer, JOIN_INNER, restrictlist, true, &outeridx); + if (!outeridx) + return false; + + innerrel_is_unique(root, joinrelids, outer->relids, + inner, JOIN_INNER, restrictlist, true, &inneridx); + if (!inneridx) + return false; + + /* We must have the same unique index for both relations. */ + if (outeridx->index->indexoid != inneridx->index->indexoid) + return false; + + /* + * The clauses that make a relation unique must be the same for both + * relations, or else we won't match the same row on each side of join. + * + * The lists of matching clauses are ordered as the index columns, so we + * just compare the list elements one by one. The varnos are different, + * so we copy the clauses and replace all mentions of outer varno with the + * inner, so that we can use equal(). + */ + forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses) + { + Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause); + Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause); + change_varno(outerExpr, outer->relid, inner->relid); + change_varno(innerExpr, outer->relid, inner->relid); + if (!equal(outerExpr, innerExpr)) + { + pfree(outerExpr); + pfree(innerExpr); + return false; + } + pfree(outerExpr); + pfree(innerExpr); + } + + return true; +} + +/* + * Find and remove unique self joins in a group of base relations that have + * the same Oid. + * + * Returns a set of relids that were removed. + */ +static Relids +remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n) +{ + PlannerInfo *root = scratch->root; + Relids joinrelids = scratch->joinrelids; + Relids result = NULL; + int i, o; + ListCell *lc; + + if (n < 2) + return NULL; + + for (o = 0; o < n; o++) + { + RelOptInfo *outer = root->simple_rel_array[relids[o]]; + + for (i = o + 1; i < n; i++) + { + RelOptInfo *inner = root->simple_rel_array[relids[i]]; + List *restrictlist; + + /* A sanity check: the relations have the same Oid. */ + Assert(root->simple_rte_array[relids[i]]->relid + == root->simple_rte_array[relids[o]]->relid); + + /* + * This optimization applies to inner joins only, so skip any relations + * that form a special join. + */ + if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]])) + continue; + + /* Reuse joinrelids bitset to avoid reallocation. */ + joinrelids = bms_del_members(joinrelids, joinrelids); + + /* + * We only deal with base rels here, so their relids bitset + * contains only one member -- their relid. + */ + joinrelids = bms_add_member(joinrelids, relids[o]); + joinrelids = bms_add_member(joinrelids, relids[i]); + + /* Is it a unique self join? */ + restrictlist = build_joinrel_restrictlist(root, joinrelids, outer, + inner); + if (!is_unique_self_join(root, joinrelids, outer, inner, + restrictlist)) + continue; + + /* + * We can't remove the join if the relations have row marks of + * different strength (e.g. one is locked FOR UPDATE and another + * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking). + */ + if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]] + && scratch->row_marks[relids[i]]->markType + != scratch->row_marks[relids[o]]->markType) + { + continue; + } + + /* + * We can remove either relation, so remove the outer one, + * to simplify this loop. + */ + remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer); + result = bms_add_member(result, relids[o]); + + /* + * Replace varno in root targetlist. + */ + foreach(lc, scratch->targetlist) + change_varno(lfirst(lc), relids[o], relids[i]); + + /* We removed the outer relation, try the next one. */ + break; + } + } + + scratch->joinrelids = joinrelids; + return result; +} + +/* + * A qsort comparator to sort the relids by the relation Oid. + */ +static int +compare_rte(const Index *left, const Index *right, PlannerInfo *root) +{ + Oid l = root->simple_rte_array[*left]->relid; + Oid r = root->simple_rte_array[*right]->relid; + return l < r ? 1 : ( l == r ? 0 : -1 ); +} + +/* + * Find and remove unique self joins on a particular level of the join tree. + * + * We sort the relations by Oid and then examine each group with the same Oid. + * If we removed any relation, remove it from joinlist as well. + */ +static void +remove_self_joins_one_level(UsjScratch *scratch, List **joinlist) +{ + ListCell *prev, *cell, *next; + Relids relidsToRemove = NULL; + Oid groupOid; + int groupStart; + int i; + int n = 0; + Index *relid_ascending = scratch->relids; + PlannerInfo *root = scratch->root; + + /* + * Collect the ids of base relations at this level of the join tree. + */ + foreach (cell, *joinlist) + { + RangeTblEntry *rte; + RelOptInfo *rel; + RangeTblRef *ref = (RangeTblRef *) lfirst(cell); + if (!IsA(ref, RangeTblRef)) + continue; + + rte = root->simple_rte_array[ref->rtindex]; + rel = root->simple_rel_array[ref->rtindex]; + + /* We only care about base relations from which we select something. */ + if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION + || rel == NULL) + { + continue; + } + + /* This optimization won't work for tables that have inheritance children. */ + if (rte->inh) + continue; + + relid_ascending[n++] = ref->rtindex; + + /* Limit the number of joins we process to control the quadratic behavior. */ + if (n > join_collapse_limit) + break; + } + + if (n < 2) + return; + + /* + * Find and process the groups of relations that have same Oid. + */ + qsort_arg(relid_ascending, n, sizeof(*relid_ascending), + (qsort_arg_comparator) compare_rte, root); + groupOid = root->simple_rte_array[relid_ascending[0]]->relid; + groupStart = 0; + for (i = 1; i < n; i++) + { + RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]]; + Assert(rte->relid != InvalidOid); + if (rte->relid != groupOid) + { + relidsToRemove = bms_add_members(relidsToRemove, + remove_self_joins_one_group(scratch, &relid_ascending[groupStart], + i - groupStart)); + groupOid = rte->relid; + groupStart = i; + } + } + Assert(groupOid != InvalidOid); + Assert(groupStart < n); + relidsToRemove = bms_add_members(relidsToRemove, + remove_self_joins_one_group(scratch, &relid_ascending[groupStart], + n - groupStart)); + + /* Delete the removed relations from joinlist. */ + cell = NULL; + next = list_head(*joinlist); + while (next) + { + Node *node; + + prev = cell; + cell = next; + next = lnext(next); + node = lfirst(cell); + + if (IsA(node, RangeTblRef) + && bms_is_member(((RangeTblRef*) node)->rtindex, relidsToRemove)) + { + *joinlist = list_delete_cell(*joinlist, cell, prev); + cell = prev; + } + } +} + +/* + * Find and remove unique self joins on a single level of a join tree, and + * recurse to handle deeper levels. + */ +static void +remove_self_joins_recurse(UsjScratch *scratch, List **joinlist) +{ + ListCell *lc; + foreach (lc, *joinlist) + { + switch (((Node*) lfirst(lc))->type) + { + case T_List: + remove_self_joins_recurse(scratch, (List **) &lfirst(lc)); + break; + case T_RangeTblRef: + break; + default: + Assert(false); + } + } + remove_self_joins_one_level(scratch, joinlist); +} + +/* + * Find and remove useless self joins. + * + * We search for joins where the same relation is joined to itself on all + * columns of some unique index. If this condition holds, then, for + * each outer row, only one inner row matches, and it is the same row + * of the same relation. This allows us to remove the join and replace + * it with a scan that combines WHERE clauses from both sides. The join + * clauses themselves assume the form of X = X and can be replaced with + * NOT NULL clauses. + * + * For the sake of simplicity, we don't apply this optimization to special + * joins. Here is a list of what we could do in some particular cases: + * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins, + * and then removed normally. + * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND + * (IS NULL on join columns OR NOT inner quals)'. + * 'a a1 left join a a2': could simplify to a scan like inner, but without + * NOT NULL condtions on join columns. + * 'a a1 left join (a a2 join b)': can't simplify this, because join to b + * can both remove rows and introduce duplicates. + * + * To search for removable joins, we order all the relations on their Oid, + * go over each set with the same Oid, and consider each pair of relations + * in this set. We check that both relation are made unique by the same + * unique index with the same clauses. + * + * To remove the join, we mark one of the participating relation as + * dead, and rewrite all references to it to point to the remaining + * relation. This includes modifying RestrictInfos, EquivalenceClasses and + * EquivalenceMembers. We also have to modify the row marks. The join clauses + * of the removed relation become either restriction or join clauses, based on + * whether they reference any relations not participating in the removed join. + * + * 'targetlist' is the top-level targetlist of query. If it has any references + * to the removed relations, we update them to point to the remaining ones. + */ +void +remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist) +{ + ListCell *lc; + UsjScratch scratch; + + scratch.root = root; + scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index)); + scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids)); + scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *)); + scratch.joinrelids = NULL; + scratch.targetlist = targetlist; + + /* Find out which relations have special joins to which. */ + foreach(lc, root->join_info_list) + { + SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc); + int bit = -1; + while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0) + { + RelOptInfo *rel = find_base_rel(root, bit); + scratch.special_join_rels[rel->relid] = + bms_add_members(scratch.special_join_rels[rel->relid], + info->min_righthand); + } + + bit = -1; + while ((bit = bms_next_member(info->min_righthand, bit)) >= 0) + { + RelOptInfo *rel = find_base_rel(root, bit); + scratch.special_join_rels[rel->relid] = + bms_add_members(scratch.special_join_rels[rel->relid], + info->min_lefthand); + } + } + + /* Collect row marks. */ + foreach (lc, root->rowMarks) + { + PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc); + + /* Can't have more than one row mark for a relation. */ + Assert(scratch.row_marks[rowMark->rti] == NULL); + + scratch.row_marks[rowMark->rti] = rowMark; + } + + /* Finally, remove the joins. */ + remove_self_joins_recurse(&scratch, joinlist); } diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 3cedd01..8d6036c 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist, * jointree preprocessing, but the necessary information isn't available * until we've built baserel data structures and classified qual clauses. */ - joinlist = remove_useless_joins(root, joinlist); + joinlist = remove_useless_left_joins(root, joinlist); /* * Also, reduce any semijoins with unique inner rels to plain inner joins. @@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist, reduce_unique_semijoins(root); /* + * Remove self joins on a unique column. + */ + remove_useless_self_joins(root, &joinlist, tlist); + + /* * Now distribute "placeholders" to base rels as needed. This has to be * done after join removal because removal could change whether a * placeholder is evaluable at a base rel. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 169e51e..493f425 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree && relation_has_unique_index_for(root, rel, NIL, sjinfo->semi_rhs_exprs, - sjinfo->semi_operators)) + sjinfo->semi_operators, + NULL /*index_info*/)) { pathnode->umethod = UNIQUE_PATH_NOOP; pathnode->path.rows = rel->rows; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 4130514..fde4118 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -39,14 +39,10 @@ typedef struct JoinHashEntry static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel); -static List *build_joinrel_restrictlist(PlannerInfo *root, - RelOptInfo *joinrel, - RelOptInfo *outer_rel, - RelOptInfo *inner_rel); static void build_joinrel_joinlist(RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel); -static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel, +static List *subbuild_joinrel_restrictlist(Relids joinrelids, List *joininfo_list, List *new_restrictlist); static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel, @@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root, */ if (restrictlist_ptr) *restrictlist_ptr = build_joinrel_restrictlist(root, - joinrel, + joinrel->relids, outer_rel, inner_rel); return joinrel; @@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root, * caller might or might not need the restrictlist, but I need it anyway * for set_joinrel_size_estimates().) */ - restrictlist = build_joinrel_restrictlist(root, joinrel, + restrictlist = build_joinrel_restrictlist(root, joinrel->relids, outer_rel, inner_rel); if (restrictlist_ptr) *restrictlist_ptr = restrictlist; @@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, * the various joinlist entries ultimately refer to RestrictInfos * pushed into them by distribute_restrictinfo_to_rels(). * - * 'joinrel' is a join relation node + * 'joinrelids' is a join relation id set * 'outer_rel' and 'inner_rel' are a pair of relations that can be joined * to form joinrel. * @@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, * RestrictInfo nodes are no longer context-dependent. Instead, just include * the original nodes in the lists made for the join relation. */ -static List * +List * build_joinrel_restrictlist(PlannerInfo *root, - RelOptInfo *joinrel, + Relids joinrelids, RelOptInfo *outer_rel, RelOptInfo *inner_rel) { @@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root, * eliminating any duplicates (important since we will see many of the * same clauses arriving from both input relations). */ - result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL); - result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result); + result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL); + result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result); /* * Add on any clauses derived from EquivalenceClasses. These cannot be @@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root, */ result = list_concat(result, generate_join_implied_equalities(root, - joinrel->relids, + joinrelids, outer_rel->relids, inner_rel)); @@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel, } static List * -subbuild_joinrel_restrictlist(RelOptInfo *joinrel, +subbuild_joinrel_restrictlist(Relids joinrelids, List *joininfo_list, List *new_restrictlist) { @@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel, { RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); - if (bms_is_subset(rinfo->required_relids, joinrel->relids)) + if (bms_is_subset(rinfo->required_relids, joinrelids)) { /* * This clause becomes a restriction clause for the joinrel, since diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index a008ae0..4d1e9ac 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -520,8 +520,10 @@ typedef struct PartitionSchemeData *PartitionScheme; * populate these fields, for base rels; but someday they might be used for * join rels too: * - * unique_for_rels - list of Relid sets, each one being a set of other - * rels for which this one has been proven unique + * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids + * is a set of other rels for which this one has been proven + * unique, and UniqueIndexInfo* stores information about the + * index that makes it unique, if any. * non_unique_for_rels - list of Relid sets, each one being a set of * other rels for which we have tried and failed to prove * this one unique diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 574bb85..7eb59c1 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root, RelOptInfo *inner_rel, SpecialJoinInfo *sjinfo, List **restrictlist_ptr); + +extern List *build_joinrel_restrictlist(PlannerInfo *root, + Relids joinrelids, + RelOptInfo *outer_rel, + RelOptInfo *inner_rel); extern Relids min_join_parameterization(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer_rel, diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 040335a..725d2c2 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel); * routines to generate index paths */ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel); +/* + * UniqueIndexInfo describes a unique index and its corresponding clauses + * that guarantee the uniqueness of a relation. + */ +typedef struct UniqueIndexInfo +{ + IndexOptInfo *index; + List *clauses; +} UniqueIndexInfo; + extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, List *restrictlist, - List *exprlist, List *oprlist); + List *exprlist, List *oprlist, + UniqueIndexInfo **info); extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index, int indexcol); extern bool match_index_to_operand(Node *operand, int indexcol, diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 3bbdb5e..e697ff6 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -16,6 +16,7 @@ #include "nodes/pathnodes.h" #include "nodes/plannodes.h" +#include "optimizer/paths.h" /* GUC parameters */ #define DEFAULT_CURSOR_TUPLE_FRACTION 0.1 @@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root); /* * prototypes for plan/analyzejoins.c */ -extern List *remove_useless_joins(PlannerInfo *root, List *joinlist); +extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist); extern void reduce_unique_semijoins(PlannerInfo *root); extern bool query_supports_distinctness(Query *query); extern bool query_is_distinct_for(Query *query, List *colnos, List *opids); + extern bool innerrel_is_unique(PlannerInfo *root, Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel, - JoinType jointype, List *restrictlist, bool force_cache); + JoinType jointype, List *restrictlist, bool force_cache, + UniqueIndexInfo **index_info); + +extern void remove_useless_self_joins(PlannerInfo *root, List **jointree, + List *tlist); /* * prototypes for plan/setrefs.c diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out index c448d85..a57905f 100644 --- a/src/test/regress/expected/equivclass.out +++ b/src/test/regress/expected/equivclass.out @@ -439,3 +439,35 @@ explain (costs off) Filter: ((unique1 = unique1) OR (unique2 = unique2)) (2 rows) +-- Test that broken ECs are processed correctly during self join removal. +-- Disable merge joins so that we don't get an error about missing commutator. +-- Test both orientations of the join clause, because only one of them breaks +-- the EC. +set enable_mergejoin to off; +explain (costs off) + select * from ec0 m join ec0 n on m.ff = n.ff + join ec1 p on m.ff + n.ff = p.f1; + QUERY PLAN +---------------------------------------- + Nested Loop + Join Filter: ((n.ff + n.ff) = p.f1) + -> Seq Scan on ec1 p + -> Materialize + -> Seq Scan on ec0 n + Filter: (ff IS NOT NULL) +(6 rows) + +explain (costs off) + select * from ec0 m join ec0 n on m.ff = n.ff + join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1; + QUERY PLAN +--------------------------------------------------------------- + Nested Loop + Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1) + -> Seq Scan on ec1 p + -> Materialize + -> Seq Scan on ec0 n + Filter: (ff IS NOT NULL) +(6 rows) + +reset enable_mergejoin; diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 593aec2..230c19c 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -4365,11 +4365,13 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k where p.k = 1 and p.k = 2; - QUERY PLAN --------------------------- + QUERY PLAN +------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows) -- bug 5255: this is not optimizable by join removal begin; @@ -4486,6 +4488,205 @@ select * from (0 rows) -- +-- test that semi- or inner self-joins on a unique column are removed +-- +-- enable only nestloop to get more predictable plans +set enable_hashjoin to off; +set enable_mergejoin to off; +create table sj (a int unique, b int); +insert into sj values (1, null), (null, 2), (2, 1); +analyze sj; +select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1; + a | b +---+--- + 2 | 1 +(1 row) + +explain (costs off) +select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1; + QUERY PLAN +----------------------------------------------- + Seq Scan on sj q + Filter: ((a IS NOT NULL) AND (b = (a - 1))) +(2 rows) + +explain (costs off) +select * from sj p +where exists (select * from sj q + where q.a = p.a and q.b < 10); + QUERY PLAN +------------------------------------------ + Seq Scan on sj q + Filter: ((a IS NOT NULL) AND (b < 10)) +(2 rows) + +-- subselect that references the removed relation +explain (costs off) +select t1.a, (select a from sj where a = t2.a and a = t1.a) +from sj t1, sj t2 +where t1.a = t2.a; + QUERY PLAN +------------------------------------------ + Seq Scan on sj t2 + Filter: (a IS NOT NULL) + SubPlan 1 + -> Result + One-Time Filter: (t2.a = t2.a) + -> Seq Scan on sj + Filter: (a = t2.a) +(7 rows) + +-- self-join under outer join +explain (costs off) +select * from sj x join sj y on x.a = y.a +left join int8_tbl z on x.a = z.q1; + QUERY PLAN +------------------------------------ + Nested Loop Left Join + Join Filter: (y.a = z.q1) + -> Seq Scan on sj y + Filter: (a IS NOT NULL) + -> Materialize + -> Seq Scan on int8_tbl z +(6 rows) + +explain (costs off) +select * from sj x join sj y on x.a = y.a +left join int8_tbl z on y.a = z.q1; + QUERY PLAN +------------------------------------ + Nested Loop Left Join + Join Filter: (y.a = z.q1) + -> Seq Scan on sj y + Filter: (a IS NOT NULL) + -> Materialize + -> Seq Scan on int8_tbl z +(6 rows) + +-- Test that placeholders are updated correctly after join removal +explain (costs off) +select * from (values (1)) x +left join (select coalesce(y.q1, 1) from int8_tbl y + right join sj j1 inner join sj j2 on j1.a = j2.a + on true) z +on true; + QUERY PLAN +------------------------------------------ + Nested Loop Left Join + -> Result + -> Nested Loop Left Join + -> Seq Scan on sj j2 + Filter: (a IS NOT NULL) + -> Materialize + -> Seq Scan on int8_tbl y +(7 rows) + +-- update lateral references +explain (costs off) +select 1 from (select x.* from sj x, sj y where x.a = y.a) q, + lateral generate_series(1, q.a) gs(i); + QUERY PLAN +------------------------------------------- + Nested Loop + -> Seq Scan on sj y + Filter: (a IS NOT NULL) + -> Function Scan on generate_series gs +(4 rows) + +explain (costs off) +select 1 from (select y.* from sj x, sj y where x.a = y.a) q, + lateral generate_series(1, q.a) gs(i); + QUERY PLAN +------------------------------------------- + Nested Loop + -> Seq Scan on sj y + Filter: (a IS NOT NULL) + -> Function Scan on generate_series gs +(4 rows) + +-- Test that a non-EC-derived join clause is processed correctly. Use an +-- outer join so that we can't form an EC. +explain (costs off) select * from sj p join sj q on p.a = q.a + left join sj r on p.a + q.a = r.a; + QUERY PLAN +------------------------------------ + Nested Loop Left Join + Join Filter: ((q.a + q.a) = r.a) + -> Seq Scan on sj q + Filter: (a IS NOT NULL) + -> Materialize + -> Seq Scan on sj r +(6 rows) + +-- Check that attr_needed is updated correctly after self-join removal. In +-- this test, k1.b is required at either j1 or j2. If this info is lost, +-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for +-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also +-- disable reordering of joins because this test depends on a particular +-- join tree. +create table sk (a int, b int); +create index sk_a_idx on sk(a); +set join_collapse_limit to 1; +set enable_seqscan to off; +explain (costs off) select 1 from + (sk k1 join sk k2 on k1.a = k2.a) + join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b; + QUERY PLAN +----------------------------------------------------- + Nested Loop + Join Filter: (k1.b = j2.b) + -> Nested Loop + -> Index Scan using sk_a_idx on sk k1 + -> Index Only Scan using sk_a_idx on sk k2 + Index Cond: (a = k1.a) + -> Materialize + -> Index Scan using sj_a_key on sj j2 + Index Cond: (a IS NOT NULL) +(9 rows) + +explain (costs off) select 1 from + (sk k1 join sk k2 on k1.a = k2.a) + join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b; + QUERY PLAN +----------------------------------------------------- + Nested Loop + Join Filter: (k1.b = j2.b) + -> Nested Loop + -> Index Scan using sk_a_idx on sk k1 + -> Index Only Scan using sk_a_idx on sk k2 + Index Cond: (a = k1.a) + -> Materialize + -> Index Scan using sj_a_key on sj j2 + Index Cond: (a IS NOT NULL) +(9 rows) + +reset join_collapse_limit; +reset enable_seqscan; +-- If index conditions are different for each side, we won't select the same +-- row on both sides, so the join can't be removed. +create table sl(a int, b int); +create unique index sl_ab on sl(a, b); +explain (costs off) +select * from sl t1, sl t2 +where t1.a = t2.a and t1.b = 1 and t2.b = 2; + QUERY PLAN +---------------------------------------------- + Nested Loop + Join Filter: (t1.a = t2.a) + -> Bitmap Heap Scan on sl t1 + Recheck Cond: (b = 1) + -> Bitmap Index Scan on sl_ab + Index Cond: (b = 1) + -> Materialize + -> Bitmap Heap Scan on sl t2 + Recheck Cond: (b = 2) + -> Bitmap Index Scan on sl_ab + Index Cond: (b = 2) +(11 rows) + +reset enable_hashjoin; +reset enable_mergejoin; +-- -- Test hints given on incorrect column references are useful -- select t1.uunique1 from diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql index 85aa65d..b1e2483 100644 --- a/src/test/regress/sql/equivclass.sql +++ b/src/test/regress/sql/equivclass.sql @@ -262,3 +262,20 @@ explain (costs off) -- this could be converted, but isn't at present explain (costs off) select * from tenk1 where unique1 = unique1 or unique2 = unique2; + + +-- Test that broken ECs are processed correctly during self join removal. +-- Disable merge joins so that we don't get an error about missing commutator. +-- Test both orientations of the join clause, because only one of them breaks +-- the EC. +set enable_mergejoin to off; + +explain (costs off) + select * from ec0 m join ec0 n on m.ff = n.ff + join ec1 p on m.ff + n.ff = p.f1; + +explain (costs off) + select * from ec0 m join ec0 n on m.ff = n.ff + join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1; + +reset enable_mergejoin; diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index 34d21d0..4b0ea7b 100644 --- a/src/test/regress/sql/join.sql +++ b/src/test/regress/sql/join.sql @@ -1570,6 +1570,96 @@ select * from int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok -- +-- test that semi- or inner self-joins on a unique column are removed +-- + +-- enable only nestloop to get more predictable plans +set enable_hashjoin to off; +set enable_mergejoin to off; + +create table sj (a int unique, b int); +insert into sj values (1, null), (null, 2), (2, 1); +analyze sj; + +select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1; + +explain (costs off) +select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1; + +explain (costs off) +select * from sj p +where exists (select * from sj q + where q.a = p.a and q.b < 10); + +-- subselect that references the removed relation +explain (costs off) +select t1.a, (select a from sj where a = t2.a and a = t1.a) +from sj t1, sj t2 +where t1.a = t2.a; + +-- self-join under outer join +explain (costs off) +select * from sj x join sj y on x.a = y.a +left join int8_tbl z on x.a = z.q1; + +explain (costs off) +select * from sj x join sj y on x.a = y.a +left join int8_tbl z on y.a = z.q1; + +-- Test that placeholders are updated correctly after join removal +explain (costs off) +select * from (values (1)) x +left join (select coalesce(y.q1, 1) from int8_tbl y + right join sj j1 inner join sj j2 on j1.a = j2.a + on true) z +on true; + +-- update lateral references +explain (costs off) +select 1 from (select x.* from sj x, sj y where x.a = y.a) q, + lateral generate_series(1, q.a) gs(i); + +explain (costs off) +select 1 from (select y.* from sj x, sj y where x.a = y.a) q, + lateral generate_series(1, q.a) gs(i); + +-- Test that a non-EC-derived join clause is processed correctly. Use an +-- outer join so that we can't form an EC. +explain (costs off) select * from sj p join sj q on p.a = q.a + left join sj r on p.a + q.a = r.a; + +-- Check that attr_needed is updated correctly after self-join removal. In +-- this test, k1.b is required at either j1 or j2. If this info is lost, +-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for +-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also +-- disable reordering of joins because this test depends on a particular +-- join tree. +create table sk (a int, b int); +create index sk_a_idx on sk(a); +set join_collapse_limit to 1; +set enable_seqscan to off; +explain (costs off) select 1 from + (sk k1 join sk k2 on k1.a = k2.a) + join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b; +explain (costs off) select 1 from + (sk k1 join sk k2 on k1.a = k2.a) + join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b; +reset join_collapse_limit; +reset enable_seqscan; + +-- If index conditions are different for each side, we won't select the same +-- row on both sides, so the join can't be removed. +create table sl(a int, b int); +create unique index sl_ab on sl(a, b); + +explain (costs off) +select * from sl t1, sl t2 +where t1.a = t2.a and t1.b = 1 and t2.b = 2; + +reset enable_hashjoin; +reset enable_mergejoin; + +-- -- Test hints given on incorrect column references are useful --