> -----Original Message----- > From: Shigeru Hanada [mailto:shigeru.han...@gmail.com] > Sent: Monday, March 16, 2015 9:59 PM > To: Robert Haas > Cc: Tom Lane; Thom Brown; Kaigai Kouhei(海外 浩平); pgsql-hackers@postgreSQL.org > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] > Custom > Plan API) > > 2015-03-14 7:18 GMT+09:00 Robert Haas <robertmh...@gmail.com>: > > I think the foreign data wrapper join pushdown case, which also aims > > to substitute a scan for a join, is interesting to think about, even > > though it's likely to be handled by a new FDW method instead of via > > the hook. Where should the FDW method get called from? Currently, > > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets > > called from add_paths_to_joinrel(). The patch at > > > http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9 > ztfzg7zn6...@mail.gmail.com > > uses that to implement join pushdown in postgres_fdw; if you have A > > JOIN B JOIN C all on server X, we'll notice that the join with A and B > > can be turned into a foreign scan on A JOIN B, and similarly for A-C > > and B-C. Then, if it turns out that the cheapest path for A-B is the > > foreign join, and the cheapest path for C is a foreign scan, we'll > > arrive at the idea of a foreign scan on A-B-C, and we'll realize the > > same thing in each of the other combinations as well. So, eventually > > the foreign join gets pushed down. > > From the viewpoint of postgres_fdw, incremental approach seemed > natural way, although postgres_fdw should consider paths in pathlist > in additon to cheapest one as you mentioned in another thread. This > approarch allows FDW to use SQL statement generated for underlying > scans as parts of FROM clause, as postgres_fdw does in the join > push-down patch. > > > But there's another possible approach: suppose that > > join_search_one_level, after considering left-sided and right-sided > > joins and after considering bushy joins, checks whether every relation > > it's got is from the same foreign server, and if so, asks that foreign > > server whether it would like to contribute any paths. Would that be > > better or worse? A disadvantage is that if you've got something like > > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > > down (say, each join clause calls a non-pushdown-safe function) you'll > > end up examining a pile of joinrels - at every level of the join tree > > - and individually rejecting each one. With the > > build-it-up-incrementally approach, you'll figure that all out at > > level 2, and then after that there's nothing to do but give up > > quickly. On the other hand, I'm afraid the incremental approach might > > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > > foreign tables on the same server. If the output of the big/huge join > > is big, none of those paths are going to survive at level 2, but the > > overall join size might be very small, so we surely want a chance to > > recover at level 3. (We discussed test cases of this form quite a bit > > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > Interesting, I overlooked that pattern. As you pointed out, join > between big foregin tables might be dominated, perhaps by a MergeJoin > path. Leaving dominated ForeignPath in pathlist for more optimization > in the future (in higher join level) is an idea, but it would make > planning time longer (and use more cycle and memory). > > Tom's idea sounds good for saving the path b), but I worry that > whether FDW can get enough information at that timing, just before > set_cheapest. It would not be good I/F if each FDW needs to copy many > code form joinrel.c... > I had a call to discuss this topic with Hanada-san. Even though he expected FDW driver needs to check and extract relations involved in a particular join, it also means we have less problem as long as core backend can handle these common portion for all FDW/CSP drivers. Thus, we need care about two hook locations. The first one is add_paths_to_joinrel() as current patch doing, for custom-scan that adds an alternative join logic and takes underlying child nodes as input. The other one is standard_join_search() as Tom pointed out, for foreign-scan of remote join, or for custom-scan that replaces an entire join subtree.
One positive aspect of this approach is, postgres_fdw can handle whole-row-reference much simpler than bottom-up approach, according to Hanada-san. Remaining issue is, how to implement the core portion that extracts relations in a particular join, and to identify join type to be applied on a particular relations. One rough idea is, we pull relids bitmap from the target joinrel, then references the SpecialJoinInfo with identical union bitmap of left/righthand. It allows to inform FDW driver which relations and which another relations shall be joined in this level. For example, if relids=0x007 and relids=0x0018 are left joined, PlannerInfo shall have a SpecialJoinInfo that fits the requirement. Also, both of left/right side is not singleton, FDW driver will takes recursive process to construct remote join query on relids=0x007 and relids=0x0018. If all of them are inner-join, we don't need to take care about this. All FDW driver needs to do is, just putting the involved relation names in FROM-clause. It is my rough idea, thus, here may be better idea to extract relations involved in a particular join on a certain level. Please tell me, if you have some other ideas. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers