On Fri, Apr 24, 2015 at 3:08 PM, Shigeru HANADA <shigeru.han...@gmail.com> wrote:
> Hi Ashutosh, > > Thanks for the review. > > 2015/04/22 19:28、Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> のメール: > > Tests > > ------- > > 1.The postgres_fdw test is re/setting enable_mergejoin at various > places. The goal of these tests seems to be to test the sanity of foreign > plans generated. So, it might be better to reset enable_mergejoin (and may > be all of enable_hashjoin, enable_nestloop_join etc.) to false at the > beginning of the testcase and set them again at the end. That way, we will > also make sure that foreign plans are chosen irrespective of future planner > changes. > > I have different, rather opposite opinion about it. I disabled other join > types as least as the tests pass, because I worry oversights come from > planner changes. I hope to eliminate enable_foo from the test script, by > improving costing model smarter. > Ok, if you can do that, that will be excellent. > > > 2. In the patch, I see that the inheritance testcases have been deleted > from postgres_fdw.sql, is that intentional? I do not see those being > replaced anywhere else. > > It’s accidental removal, I restored the tests about inheritance feature. > Thanks. > > > 3. We need one test for each join type (or at least for INNER and LEFT > OUTER) where there are unsafe to push conditions in ON clause along-with > safe-to-push conditions. For INNER join, the join should get pushed down > with the safe conditions and for OUTER join it shouldn't be. Same goes for > WHERE clause, in which case the join will be pushed down but the > unsafe-to-push conditions will be applied locally. > > Currently INNER JOINs with unsafe join conditions are not pushed down, so > such test is not in the suit. As you say, in theory, INNER JOINs can be > pushed down even they have push-down-unsafe join conditions, because such > conditions can be evaluated no local side against rows retrieved without > those conditions. > > > 4. All the tests have ORDER BY, LIMIT in them, so the setref code is > being exercised. But, something like aggregates would test the setref code > better. So, we should add at-least one test like select avg(ft1.c1 + > ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1). > > Added an aggregate case, and also added an UNION case for Append. > Thanks. > > > 5. It will be good to add some test which contain join between few > foreign and few local tables to see whether we are able to push down the > largest possible foreign join tree to the foreign server. > > > > Are you planning to do anything on this point? > > > > Code > > ------- > > In classifyConditions(), the code is now appending RestrictInfo::clause > rather than RestrictInfo itself. But the callers of classifyConditions() > have not changed. Is this change intentional? > > Yes, the purpose of the change is to make appendConditions (former name is > appendWhereClause) can handle JOIN ON clause, list of Expr. > > > The functions which consume the lists produced by this function handle > expressions as well RestrictInfo, so you may not have noticed it. Because > of this change, we might be missing some optimizations e.g. in function > postgresGetForeignPlan() > > 793 if (list_member_ptr(fpinfo->remote_conds, rinfo)) > > 794 remote_conds = lappend(remote_conds, rinfo->clause); > > 795 else if (list_member_ptr(fpinfo->local_conds, rinfo)) > > 796 local_exprs = lappend(local_exprs, rinfo->clause); > > 797 else if (is_foreign_expr(root, baserel, rinfo->clause)) > > 798 remote_conds = lappend(remote_conds, rinfo->clause); > > 799 else > > 800 local_exprs = lappend(local_exprs, rinfo->clause); > > Finding a RestrictInfo in remote_conds avoids another call to > is_foreign_expr(). So with this change, I think we are doing an extra call > to is_foreign_expr(). > > > > Hm, it seems better to revert my change and make appendConditions downcast > given information into RestrictInfo or Expr according to the node tag. > Thanks. > > > The function get_jointype_name() returns an empty string for unsupported > join types. Instead of that it should throw an error, if some code path > accidentally calls the function with unsupported join type e.g. SEMI_JOIN. > > Agreed, fixed. > Thanks. > > > While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE > clause in the original query is not being honored, which means that we will > end up locking the rows which are not part of the join result even when the > join is pushed to the foreign server. E.g take the following query (it uses > the tables created in postgres_fdw.sql tests) > > contrib_regression=# explain verbose select * from ft1 join ft2 on > (ft1.c1 = ft2.c1) for update of ft1; > > > > > > QUERY PLAN > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > > ---------------------------------------------------------------------------------------------------- > > LockRows (cost=100.00..124.66 rows=822 width=426) > > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, > ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, > ft1.*, ft2.* > > -> Foreign Scan (cost=100.00..116.44 rows=822 width=426) > > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, > ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, > ft1.*, > > ft2.* > > Relations: (public.ft1) INNER JOIN (public.ft2) > > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, > l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM > (SELECT l.a > > 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, > l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, > c3 a12 > > , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) > l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, > r.a12, > > r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, > r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, > c6 > > a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, > a8, a9) ON ((l.a1 = r.a1)) > > (6 rows) > > It's expected that only the rows which are part of join result will be > locked by FOR UPDATE clause. The query sent to the foreign server has > attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" > on foreign server). As per the postgresql documentation, "When a locking > clause appears in a sub-SELECT, the rows locked are those returned to the > outer query by the sub-query.". So it's going to lock all rows from "S > 1"."T 1", rather than only the rows which are part of join. This is going > to increase probability of deadlocks, if the join is between a big table > and small table where big table is being used in many queries and the join > is going to have only a single row in the result. > > > > Are you planning to do anything about this point? > > > > Since there is no is_first argument to appendConditions(), we should > remove corresponding line from the function prologue. > > > > Oops, replaced with the description of prefix. > > > > The name TO_RELATIVE() doesn't convey the full meaning of the macro. May > be GET_RELATIVE_ATTNO() or something like that. > > Fixed. > Thanks. > > > > > In postgresGetForeignJoinPaths(), while separating the conditions into > join quals and other quals, > > 3014 if (IS_OUTER_JOIN(jointype)) > > 3015 { > > 3016 extract_actual_join_clauses(joinclauses, &joinclauses, > &otherclauses); > > 3017 } > > 3018 else > > 3019 { > > 3020 joinclauses = extract_actual_clauses(joinclauses, false); > > 3021 otherclauses = NIL; > > 3022 } > > we shouldn't differentiate between outer and inner join. For inner join > the join quals can be treated as other clauses and they will be returned as > other clauses, which is fine. Also, the following condition > > 3050 /* > > 3051 * Other condition for the join must be safe to push down. > > 3052 */ > > 3053 foreach(lc, otherclauses) > > 3054 { > > 3055 Expr *expr = (Expr *) lfirst(lc); > > 3056 > > 3057 if (!is_foreign_expr(root, joinrel, expr)) > > 3058 { > > 3059 ereport(DEBUG3, (errmsg("filter contains unsafe > conditions"))); > > 3060 return; > > 3061 } > > 3062 } > > is unnecessary. I there are filter conditions which are unsafe to push > down, they can be applied locally after obtaining the join result from the > foreign server. The join quals are all needed to be safe to push down, > since they decide which rows will contain NULL inner side in an OUTER join. > > I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t > need* to differentiate if we are talking about only the result of filtering. > > IMO we *should* differentiate inner and outer (or differentiate join > conditions and filter conditions) because all conditions of typical INNER > JOINs go into otherclauses because their is_pushed_down flag is on, so such > joins look like CROSS JOIN + WHERE filter. In the latest patch EXPLAIN > shows the join combinations of a foreign join scan node with join type, but > your suggestion makes it looks like this: > > fdw=# explain (verbose) select * from pgbench_branches b join > pgbench_tellers t on t.bid = b.bid; > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ---------------- > Foreign Scan (cost=100.00..101.00 rows=50 width=716) > Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler > Relations: (public.pgbench_branches b) CROSS JOIN > (public.pgbench_tellers t) > Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM > (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11 > FROM public.pgbench_branches) l) > l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT > tid a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r > (a1, a2, a3, a4) WHERE > ((l.a1 = r.a2)) > (4 rows) > > Thoughts? > It does hamper readability a bit. But it explicitly shows, how do we want to treat the join. We can leave this to the committers though. > > Regards, > -- > Shigeru HANADA > shigeru.han...@gmail.com > > > > > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company