Thanks Robert for your comments. Please see my replies inlined. Updated patch is attached.
On Fri, Nov 6, 2015 at 10:02 PM, Robert Haas <robertmh...@gmail.com> wrote: > > I think this approach is generally reasonable, but I suggested parts > of it, so may be biased. I would be interested in hearing the > opinions of others. > > Random notes: > > "possibily" is a typo. > Fixed. > > usable_pklist is confusing because it seems like it might be talking > about primary keys rather than pathkeys. Also, I realize now, looking > at this again, that you're saying "usable" when what I really think > you mean is "useful". Lots of pathkeys are usable, but only a few of > those are useful. I suggest renaming usable_pathkeys to > query_pathkeys Done. > and usable_pklist to useful_pathkeys. pathkeys is being used to mean a list of pathkey nodes. What we want here is list of pathkeys so I have renamed it as useful_pathkeys_list, somewhat long but clear. > Similarly, let's > rename generate_pathkeys_for_relation() to > get_useful_pathkeys_for_relation(). > Done. > > Although I'm usually on the side of marking things as extern whenever > we find it convenient, I'm nervous about doing that to > make_canonical_pathkey(), because it has side effects. Searching the > list of canonical pathkeys for the one we need is reasonable, but is > it really right to ever think that we might create a new one at this > stage? Maybe it is, but if so I'd like to hear a good explanation as > to why. > For a foreign table no pathkeys are created before creating paths for individual foreign table since there are no indexes. For a regular table, the pathkeys get created for useful indexes. Exception to this is if the expression appears in the ORDER BY clause, the pathkey for the same is created while handling ORDER BY clause. So, we will always need to create pathkey for a foreign table unless the corresponding expression does not appear in the ORDER BY clause. This can be verified by breaking in make_canonical_pathkey() while executing "explain verbose select * from ft1 join lt using (val);" ft1(val int, val2 int) is foreign table and lt (val int, val2 int) is local table. You will hit the first breakpoint with stack trace Breakpoint 1, make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60 (gdb) where #0 make_canonical_pathkey (root=0x231d858, eclass=0x231f030, opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60 #1 0x00007f6740077e39 in get_useful_pathkeys_for_relation (root=0x231d858, rel=0x231e390) at postgres_fdw.c:663 #2 0x00007f6740077f34 in postgresGetForeignPaths (root=0x231d858, baserel=0x231e390, foreigntableid=16393) at postgres_fdw.c:705 #3 0x00000000007079cf in set_foreign_pathlist (root=0x231d858, rel=0x231e390, rte=0x231c488) at allpaths.c:600 #4 0x0000000000707653 in set_rel_pathlist (root=0x231d858, rel=0x231e390, rti=1, rte=0x231c488) at allpaths.c:395 #5 0x000000000070735f in set_base_rel_pathlists (root=0x231d858) at allpaths.c:277 at this point (gdb) p root->canon_pathkeys $1 = (List *) 0x0 so get_useful_pathkeys_for_relation->make_canonical_pathkey() will create the first pathkey. I have left the corresponding TODO untouched, if anybody else wants to review that part of the code. I will remove it in the final version of the patch. > > Is the comment "Equivalence classes covering relations other than the > current one are of interest here" missing a "not"? > The sentence is correct. We need equivalence class covering relations other than the current one, because only such equivalence classes indicate joins between two relations. If an equivalence class covers only a single baserel (has only a single member in ec_relids), it indicates equivalence between columns/expressions of the same table, which can not result in a join. I have changed to comment to be more elaborate. > > I don't find this comment illuminating: > > + * In case of child relation, we need to check that the > + * equivalence class indicates a join to a relation other than > + * parents, other children and itself (something similar to > above). > + * Otherwise we will end up creating useless paths. The code > below is > + * similar to generate_implied_equalities_for_column(), which > might > + * give a hint. > > That basically just says that we have to do it this way because the > other way would be wrong. But it doesn't say WHY the other way would > be wrong. There's no "other way" which is wrong. What's the other way you are talking about? Anway, I have updated the comment to be more readable. > Then a few lines later, you have another comment which says > the same thing again: > > + /* > + * Ignore equivalence members which correspond to children > + * or same relation or to parent relations > + */ > > Updated this too. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 866a09b..fa071d6 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -3467,22 +3467,22 @@ select tableoid::regclass, * from bar order by 1,2; bar2 | 4 | 144 bar2 | 7 | 77 (6 rows) -- Check UPDATE with inherited target and an appendrel subquery explain (verbose, costs off) update bar set f2 = f2 + 100 from ( select f1 from foo union all select f1+3 from foo ) ss where bar.f1 = ss.f1; - QUERY PLAN --------------------------------------------------------------------------------------- + QUERY PLAN +------------------------------------------------------------------------------------------------ Update on public.bar Update on public.bar Foreign Update on public.bar2 Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 -> Hash Join Output: bar.f1, (bar.f2 + 100), bar.ctid, (ROW(foo.f1)) Hash Cond: (foo.f1 = bar.f1) -> Append -> Seq Scan on public.foo Output: ROW(foo.f1), foo.f1 @@ -3494,41 +3494,44 @@ where bar.f1 = ss.f1; -> Foreign Scan on public.foo2 foo2_1 Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3) Remote SQL: SELECT f1 FROM public.loct1 -> Hash Output: bar.f1, bar.f2, bar.ctid -> Seq Scan on public.bar Output: bar.f1, bar.f2, bar.ctid -> Merge Join Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, (ROW(foo.f1)) Merge Cond: (bar2.f1 = foo.f1) - -> Sort + -> Foreign Scan on public.bar2 Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid - Sort Key: bar2.f1 - -> Foreign Scan on public.bar2 - Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid - Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE - -> Sort + Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 ORDER BY f1 ASC FOR UPDATE + -> Materialize Output: (ROW(foo.f1)), foo.f1 - Sort Key: foo.f1 - -> Append - -> Seq Scan on public.foo - Output: ROW(foo.f1), foo.f1 + -> Merge Append + Sort Key: foo.f1 + -> Sort + Output: (ROW(foo.f1)), foo.f1 + Sort Key: foo.f1 + -> Seq Scan on public.foo + Output: ROW(foo.f1), foo.f1 -> Foreign Scan on public.foo2 Output: ROW(foo2.f1), foo2.f1 - Remote SQL: SELECT f1 FROM public.loct1 - -> Seq Scan on public.foo foo_1 - Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3) + Remote SQL: SELECT f1 FROM public.loct1 ORDER BY f1 ASC + -> Sort + Output: (ROW((foo_1.f1 + 3))), ((foo_1.f1 + 3)) + Sort Key: ((foo_1.f1 + 3)) + -> Seq Scan on public.foo foo_1 + Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3) -> Foreign Scan on public.foo2 foo2_1 Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3) - Remote SQL: SELECT f1 FROM public.loct1 -(45 rows) + Remote SQL: SELECT f1 FROM public.loct1 ORDER BY (f1 + 3) ASC +(48 rows) update bar set f2 = f2 + 100 from ( select f1 from foo union all select f1+3 from foo ) ss where bar.f1 = ss.f1; select tableoid::regclass, * from bar order by 1,2; tableoid | f1 | f2 ----------+----+----- bar | 1 | 211 bar | 2 | 222 diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index cd4ed0c..0bc6375 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node, static void postgresExplainForeignModify(ModifyTableState *mtstate, ResultRelInfo *rinfo, List *fdw_private, int subplan_index, ExplainState *es); static bool postgresAnalyzeForeignTable(Relation relation, AcquireSampleRowsFunc *func, BlockNumber *totalpages); static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid); +static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel); /* * Helper functions */ static void estimate_path_cost_size(PlannerInfo *root, RelOptInfo *baserel, List *join_conds, List *pathkeys, double *p_rows, int *p_width, Cost *p_startup_cost, Cost *p_total_cost); @@ -501,100 +502,256 @@ postgresGetForeignRelSize(PlannerInfo *root, set_baserel_size_estimates(root, baserel); /* Fill in basically-bogus cost estimates for use later. */ estimate_path_cost_size(root, baserel, NIL, NIL, &fpinfo->rows, &fpinfo->width, &fpinfo->startup_cost, &fpinfo->total_cost); } } /* + * get_useful_pathkeys_for_relation + * Function collects useful pathkeys for given relation. The caller is + * possibly going to create paths with these pathkeys so as to get the data + * sorted according to the pathkeys. + * Following pathkeys are of interest here. + * 1. query pathkeys + * 2. pathkeys arising out of the equivalence classes + * Comments in the function explain these two in details. + * + * TODO: This function requires make_canonical_pathkey() to be "extern"alized. + * The function is required to get pathkey corresponding to a given equivalence + * class if there exists one already or create one otherwise. There are two ways + * we can avoid "extern"alization + * 1. Create a wrapper extern function returning pathkey with inputs root and + * equivalence class. Adding the wrapper function in pathkeys.c would avoid + * "extern"alization of the make_canonical_pathkey(). + * 2. Move get_useful_pathkeys_for_relation() to pathkeys.c. Considering that this + * function is heuristically finding the "interesting" pathkeys, it may not + * be very useful in general, and thus can not be part of the core. + */ +static List * +get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel) +{ + List *useful_pathkeys_list = NIL; /* List of pathkeys to be returned */ + bool useful_query_pathkeys; + ListCell *lc; + bool is_child_rel = (rel->reloptkind == RELOPT_OTHER_MEMBER_REL); + + /* + * Determine whether we can potentially push query pathkeys to the remote + * side, avoiding a local sort. + * By default assume that root->query_pathkeys is pushable. + */ + useful_query_pathkeys = true; + foreach(lc, root->query_pathkeys) + { + PathKey *pathkey = (PathKey *) lfirst(lc); + EquivalenceClass *pathkey_ec = pathkey->pk_eclass; + Expr *em_expr; + + /* + * is_foreign_expr would detect volatile expressions as well, but + * ec_has_volatile saves some cycles. + */ + if (pathkey_ec->ec_has_volatile || + !(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) || + !is_foreign_expr(root, rel, em_expr)) + { + /* + * The planner and executor don't have any clever strategy for + * taking data sorted by a prefix of the query's pathkeys and + * getting it to be sorted by all of those pathekeys. We'll just + * end up resorting the entire data set. So, unless we can push + * down all of the query pathkeys, forget it. + */ + useful_query_pathkeys = false; + } + } + + if (useful_query_pathkeys) + useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys)); + + /* + * Check equivalence classes where this relation appears. Getting data + * ordered on corresponding expressions might help merge join. A join + * between two relations might have multiple merge joinable clauses, thus + * fetching foreign data sorted on all the expressions involved optimized + * merge join locally. But we do not have knowledge about the indexes + * present on the foreign server. The cost of sorting data on multiple + * expressions can vary greatly based on the kind of expressions and their + * presence in an index. Hence for now, we create paths with single pathkey. + * Nothing to do if the relation is not part of any equivalence class. + */ + if (!rel->has_eclass_joins) + return useful_pathkeys_list; + + foreach(lc, root->eq_classes) + { + EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc); + Expr *em_expr; + + /* + * Won't generate joinclauses if const, single-member or has + * volatile expression. + */ + if (cur_ec->ec_has_const || cur_ec->ec_has_volatile || + list_length(cur_ec->ec_members) <= 1) + continue; + + /* + * If an equivalence class covers other relations along with the given + * relation, it indicates a merge joinable clause between the given + * relation and other relations. Such equivalence class would make a + * useful pathkey for merge join. ec_relids contains only the parent + * relation and not child relations, hence do not use this test if + * given relation is a child relation. + */ + if (!is_child_rel && + bms_subset_compare(rel->relids, cur_ec->ec_relids) != BMS_SUBSET1) + continue; + + /* + * In case of child relation, we examine individual equivalence class + * members. If there exists a member which does not include the given relation, + * its parent or other siblings, the equivalence class indicates a merge + * joinable clause involving given relation with some relation other + * than itself, its parent or siblings. + */ + if (is_child_rel) + { + ListCell *lc_em; + Relids parent_relids = find_childrel_parents(root, rel); + bool has_other_rel_joins = false; + foreach(lc_em, cur_ec->ec_members) + { + EquivalenceMember *other_em = lfirst(lc_em); + + /* + * It's ok to ignore all the children (not just siblings) as + * equivalence members belonging to their parents will appear + * in the same list and will not be ignored. + */ + if (other_em->em_is_child || + bms_overlap(rel->relids, other_em->em_relids) || + bms_overlap(parent_relids, other_em->em_relids)) + continue; + + /* Found one "other" relation */ + has_other_rel_joins = true; + break; + } + + if (!has_other_rel_joins) + continue; + } + + /* + * If there exists an equivalence member which entirely belongs to + * the current relation and corresponding expression is pushable to + * the foreign server, create single element pathkeys for the same. + * This pathkey might be part of pathkeys derived from + * root->query_pathkeys, but the respective paths might have different + * costs. add_path() would reject any paths which have comparable costs + * but inferior pathkeys. + */ + if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) && + is_foreign_expr(root, rel, em_expr)) + { + Pathkey *pathkey; + List *pathkeys; + + pathkey = make_canonical_pathkey(root, cur_ec, + linitial_oid(cur_ec->ec_opfamilies), + BTLessStrategyNumber, + false); + pathkeys = list_make1(pathkey); + + /* + * If this single member pathkeys is same as the + * root->query_pathkeys, do not include it again (note: in such case + * root->query_pathkeys must have been found useful above.). Since + * root->eq_classes is list of distinct equivalence classes, + * pathkeys generated from this list should all be distinct. Hence + * we need to compare this single member pathkeys with only + * root->query_pathkeys and not others that are created in this + * loop. + */ + if (compare_pathkeys(pathkeys, root->query_pathkeys) != PATHKEYS_EQUAL) + useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys); + else + { + /* + * For the convenience that compare_pathkeys provides, we have + * to create a list above and free it here if found duplicate. + * There is no function which can compare a single pathkey with + * list of pathkeys. + */ + list_free(pathkeys); + } + } + } + + return useful_pathkeys_list; +} + +/* * postgresGetForeignPaths * Create possible scan paths for a scan on the foreign table */ static void postgresGetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) { PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private; ForeignPath *path; List *ppi_list; ListCell *lc; - List *usable_pathkeys = NIL; + List *useful_pathkeys_list = NIL; /* List of all pathkeys */ /* * Create simplest ForeignScan path node and add it to baserel. This path * corresponds to SeqScan path of regular tables (though depending on what * baserestrict conditions we were able to send to remote, there might * actually be an indexscan happening there). We already did all the work * to estimate cost and size of this path. */ path = create_foreignscan_path(root, baserel, fpinfo->rows, fpinfo->startup_cost, fpinfo->total_cost, NIL, /* no pathkeys */ NULL, /* no outer rel either */ NIL); /* no fdw_private list */ add_path(baserel, (Path *) path); - /* - * Determine whether we can potentially push query pathkeys to the remote - * side, avoiding a local sort. - */ - foreach(lc, root->query_pathkeys) - { - PathKey *pathkey = (PathKey *) lfirst(lc); - EquivalenceClass *pathkey_ec = pathkey->pk_eclass; - Expr *em_expr; - - /* - * is_foreign_expr would detect volatile expressions as well, but - * ec_has_volatile saves some cycles. - */ - if (!pathkey_ec->ec_has_volatile && - (em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) && - is_foreign_expr(root, baserel, em_expr)) - usable_pathkeys = lappend(usable_pathkeys, pathkey); - else - { - /* - * The planner and executor don't have any clever strategy for - * taking data sorted by a prefix of the query's pathkeys and - * getting it to be sorted by all of those pathekeys. We'll just - * end up resorting the entire data set. So, unless we can push - * down all of the query pathkeys, forget it. - */ - list_free(usable_pathkeys); - usable_pathkeys = NIL; - break; - } - } + useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel); - /* Create a path with useful pathkeys, if we found one. */ - if (usable_pathkeys != NULL) + /* Create one path for each set of pathkeys we found above. */ + foreach (lc, useful_pathkeys_list) { double rows; int width; Cost startup_cost; Cost total_cost; + List *useful_pathkeys = lfirst(lc); - estimate_path_cost_size(root, baserel, NIL, usable_pathkeys, + estimate_path_cost_size(root, baserel, NIL, useful_pathkeys, &rows, &width, &startup_cost, &total_cost); add_path(baserel, (Path *) create_foreignscan_path(root, baserel, rows, startup_cost, total_cost, - usable_pathkeys, + useful_pathkeys, NULL, NIL)); } /* * If we're not using remote estimates, stop here. We have no way to * estimate whether any join clauses would be worth sending across, so * don't bother building parameterized paths. */ if (!fpinfo->use_remote_estimate) diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h index f243de8..d13a244 100644 --- a/contrib/postgres_fdw/postgres_fdw.h +++ b/contrib/postgres_fdw/postgres_fdw.h @@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root, extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, List *returningList, List **retrieved_attrs); extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel); extern void deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs); extern void deparseStringLiteral(StringInfo buf, const char *val); extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel); extern void appendOrderByClause(StringInfo buf, PlannerInfo *root, - RelOptInfo *baserel, List *pathkeys); + RelOptInfo *baserel, List *pathkeys); /* in shippable.c */ extern bool is_builtin(Oid objectId); extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo); #endif /* POSTGRES_FDW_H */ diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index c6b5d78..b81cc49 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -21,43 +21,40 @@ #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "nodes/plannodes.h" #include "optimizer/clauses.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/tlist.h" #include "utils/lsyscache.h" -static PathKey *make_canonical_pathkey(PlannerInfo *root, - EquivalenceClass *eclass, Oid opfamily, - int strategy, bool nulls_first); static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys); static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey); /**************************************************************************** * PATHKEY CONSTRUCTION AND REDUNDANCY TESTING ****************************************************************************/ /* * make_canonical_pathkey * Given the parameters for a PathKey, find any pre-existing matching * pathkey in the query's list of "canonical" pathkeys. Make a new * entry if there's not one already. * * Note that this function must not be used until after we have completed * merging EquivalenceClasses. (We don't try to enforce that here; instead, * equivclass.c will complain if a merge occurs after root->canon_pathkeys * has become nonempty.) */ -static PathKey * +PathKey * make_canonical_pathkey(PlannerInfo *root, EquivalenceClass *eclass, Oid opfamily, int strategy, bool nulls_first) { PathKey *pk; ListCell *lc; MemoryContext oldcontext; /* The passed eclass might be non-canonical, so chase up to the top */ while (eclass->ec_merged) diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 87123a5..5ef6e1a 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root, extern List *select_outer_pathkeys_for_merge(PlannerInfo *root, List *mergeclauses, RelOptInfo *joinrel); extern List *make_inner_pathkeys_for_merge(PlannerInfo *root, List *mergeclauses, List *outer_pathkeys); extern List *truncate_useless_pathkeys(PlannerInfo *root, RelOptInfo *rel, List *pathkeys); extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel); +extern PathKey *make_canonical_pathkey(PlannerInfo *root, + EquivalenceClass *eclass, Oid opfamily, + int strategy, bool nulls_first); #endif /* PATHS_H */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers