Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Kouhei Kaigai Mon, 16 Mar 2015 23:26:34 -0700

> -----Original Message-----
> From: Shigeru Hanada [mailto:[email protected]]
> Sent: Monday, March 16, 2015 9:59 PM
> To: Robert Haas
> Cc: Tom Lane; Thom Brown; Kaigai Kouhei(海外 浩平); [email protected]
> Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] 
> Custom
> Plan API)
> 
> 2015-03-14 7:18 GMT+09:00 Robert Haas <[email protected]>:
> > I think the foreign data wrapper join pushdown case, which also aims
> > to substitute a scan for a join, is interesting to think about, even
> > though it's likely to be handled by a new FDW method instead of via
> > the hook.  Where should the FDW method get called from?  Currently,
> > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
> > called from add_paths_to_joinrel().  The patch at
> >
> http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9
> [email protected]
> > uses that to implement join pushdown in postgres_fdw; if you have A
> > JOIN B JOIN C all on server X, we'll notice that the join with A and B
> > can be turned into a foreign scan on A JOIN B, and similarly for A-C
> > and B-C.  Then, if it turns out that the cheapest path for A-B is the
> > foreign join, and the cheapest path for C is a foreign scan, we'll
> > arrive at the idea of a foreign scan on A-B-C, and we'll realize the
> > same thing in each of the other combinations as well.  So, eventually
> > the foreign join gets pushed down.
> 
> From the viewpoint of postgres_fdw, incremental approach seemed
> natural way, although postgres_fdw should consider paths in pathlist
> in additon to cheapest one as you mentioned in another thread.  This
> approarch allows FDW to use SQL statement generated for underlying
> scans as parts of FROM clause, as postgres_fdw does in the join
> push-down patch.
> 
> > But there's another possible approach: suppose that
> > join_search_one_level, after considering left-sided and right-sided
> > joins and after considering bushy joins, checks whether every relation
> > it's got is from the same foreign server, and if so, asks that foreign
> > server whether it would like to contribute any paths. Would that be
> > better or worse?  A disadvantage is that if you've got something like
> > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
> > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
> > down (say, each join clause calls a non-pushdown-safe function) you'll
> > end up examining a pile of joinrels - at every level of the join tree
> > - and individually rejecting each one.  With the
> > build-it-up-incrementally approach, you'll figure that all out at
> > level 2, and then after that there's nothing to do but give up
> > quickly.  On the other hand, I'm afraid the incremental approach might
> > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
> > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
> > foreign tables on the same server.  If the output of the big/huge join
> > is big, none of those paths are going to survive at level 2, but the
> > overall join size might be very small, so we surely want a chance to
> > recover at level 3.  (We discussed test cases of this form quite a bit
> > in the context of e2fa76d80ba571d4de8992de6386536867250474.)
> 
> Interesting, I overlooked that pattern.  As you pointed out, join
> between big foregin tables might be dominated, perhaps by a MergeJoin
> path.  Leaving dominated ForeignPath in pathlist for more optimization
> in the future (in higher join level) is an idea, but it would make
> planning time longer (and use more cycle and memory).
> 
> Tom's idea sounds good for saving the path b), but I worry that
> whether FDW can get enough information at that timing, just before
> set_cheapest.  It would not be good I/F if each FDW needs to copy many
> code form joinrel.c...
>
I had a call to discuss this topic with Hanada-san. Even though he
expected FDW driver needs to check and extract relations involved
in a particular join, it also means we have less problem as long as
core backend can handle these common portion for all FDW/CSP drivers.
Thus, we need care about two hook locations. The first one is 
add_paths_to_joinrel() as current patch doing, for custom-scan that
adds an alternative join logic and takes underlying child nodes as
input. The other one is standard_join_search() as Tom pointed out,
for foreign-scan of remote join, or for custom-scan that replaces
an entire join subtree.


One positive aspect of this approach is, postgres_fdw can handle
whole-row-reference much simpler than bottom-up approach, according
to Hanada-san.

Remaining issue is, how to implement the core portion that extracts
relations in a particular join, and to identify join type to be
applied on a particular relations.
One rough idea is, we pull relids bitmap from the target joinrel,
then references the SpecialJoinInfo with identical union bitmap
of left/righthand. It allows to inform FDW driver which relations
and which another relations shall be joined in this level.
For example, if relids=0x007 and relids=0x0018 are left joined,
PlannerInfo shall have a SpecialJoinInfo that fits the requirement.
Also, both of left/right side is not singleton, FDW driver will
takes recursive process to construct remote join query on relids=0x007
and relids=0x0018. If all of them are inner-join, we don't need to
take care about this. All FDW driver needs to do is, just putting
the involved relation names in FROM-clause.

It is my rough idea, thus, here may be better idea to extract
relations involved in a particular join on a certain level.
Please tell me, if you have some other ideas.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <[email protected]>


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Reply via email to