Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Kouhei Kaigai Sun, 15 Mar 2015 23:52:45 -0700

The attached patch changed invocation order of GetForeignJoinPaths and
set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml.


Other portions are kept as previous version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <[email protected]>


> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Kouhei Kaigai
> Sent: Sunday, March 15, 2015 11:38 AM
> To: Robert Haas; Tom Lane
> Cc: Thom Brown; Shigeru Hanada; [email protected]
> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
> 
> > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <[email protected]> wrote:
> > > Robert Haas <[email protected]> writes:
> > >> Another bit of this that I think we could commit without fretting
> > >> about it too much is the code adding set_join_pathlist_hook.  This is
> > >> - I think - analogous to set_rel_pathlist_hook, and like that hook,
> > >> could be used for other purposes than custom plan generation - e.g. to
> > >> delete paths we do not want to use.  I've extracted this portion of
> > >> the patch and adjusted the comments; if there are no objections, I
> > >> will commit this bit also.
> > >
> > > I don't object to the concept, but I think that is a pretty bad place
> > > to put the hook call: add_paths_to_joinrel is typically called multiple
> > > (perhaps *many*) times per joinrel and thus this placement would force
> > > any user of the hook to do a lot of repetitive work.
> >
> > Interesting point.  I guess the question is whether a some or all
> > callers are going to actually *want* a separate call for each
> > invocation of add_paths_to_joinrel(), or whether they'll be happy to
> > operate on the otherwise-complete path list.  It's true that if your
> > goal is to delete paths, it's probably best to be called just once
> > after the path list is complete, and there might be a use case for
> > that, but I guess it's less useful than for baserels.  For a baserel,
> > as long as you don't nuke the sequential-scan path, there is always
> > going to be a way to complete the plan; so this would be a fine way to
> > implement a disable-an-index extension.  But for joinrels, it's not so
> > easy to rule out, say, a hash-join here.  Neither hook placement is
> > much good for that; the path you want to get rid of may have already
> > dominated paths you want to keep.
> >
> From the standpoint of extension development, I'm uncertain whether we
> can easily reproduce information needed to compute alternative paths on
> the hook at standard_join_search(), like a hook at add_paths_to_joinrel().
> 
> (Please correct me, if I misunderstood.)
> For example, it is not obvious which path is inner/outer of the joinrel
> on which custom-scan provider tries to add an alternative scan path.
> Probably, extension needs to find out the path of source relations from
> the join_rel_level[] array.
> Also, how do we pull SpecialJoinInfo? It contains needed information to
> identify required join-type (like JOIN_LEFT), however, extension needs
> to search join_info_list by relids again, if hook is located at
> standard_join_search().
> Even if number of hook invocation is larger if it is located on
> add_paths_to_joinrel(), it allows to design extensions simpler,
> I think.
> 
> > Suppose you want to add paths - e.g. you have an extension that goes
> > and looks for a materialized view that matches this subtree of the
> > query, and if it finds one, it substitutes a scan of the materialized
> > view for a scan of the baserel.  Or, as in KaiGai's case, you have an
> > extension that can perform the whole join in GPU-land and produce the
> > same results we would have gotten via normal execution.  Either way,
> > you want - and this is the central point of the whole patch here - to
> > inject a scan path into a joinrel.  It is not altogether obvious to me
> > what the best placement for this is.  In the materialized view case,
> > you probably need a perfect match between the baserels in the view and
> > the baserels in the joinrel to do anything.  There's no point in
> > re-checking that for every innerrels/outerrels combination.  I don't
> > know enough about the GPU case to reason about it intelligently; maybe
> > KaiGai can comment.
> >
> In case of GPU, extension will add alternative paths based on hash-join
> and nested-loop algorithm with individual cost estimation as long as
> device can execute join condition. It expects planner (set_cheapest)
> will choose the best path in the built-in/additional ones.
> So, it is more reasonable for me, if extension can utilize a common
> infrastructure as built-in logic (hash-join/merge-join/nested-loop)
> is using to compute its cost estimation.
> 
> > But there's another possible approach: suppose that
> > join_search_one_level, after considering left-sided and right-sided
> > joins and after considering bushy joins, checks whether every relation
> > it's got is from the same foreign server, and if so, asks that foreign
> > server whether it would like to contribute any paths. Would that be
> > better or worse?  A disadvantage is that if you've got something like
> > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
> > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
> > down (say, each join clause calls a non-pushdown-safe function) you'll
> > end up examining a pile of joinrels - at every level of the join tree
> > - and individually rejecting each one.  With the
> > build-it-up-incrementally approach, you'll figure that all out at
> > level 2, and then after that there's nothing to do but give up
> > quickly.  On the other hand, I'm afraid the incremental approach might
> > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
> > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
> > foreign tables on the same server.  If the output of the big/huge join
> > is big, none of those paths are going to survive at level 2, but the
> > overall join size might be very small, so we surely want a chance to
> > recover at level 3.  (We discussed test cases of this form quite a bit
> > in the context of e2fa76d80ba571d4de8992de6386536867250474.)
> >
> > Thoughts?
> >
> Do we need to pay attention on relids of joinrel, instead of innerpath
> and outerpath? Yep, we might assume a path with join pushed-down has
> cheaper cost than combination of two foreign-scan and a local join,
> however, foreign-scan with join pushed-down may partially have
> expensive cost.
> In this case, either of hook location may be reasonable, because FDW
> driver can check whether all the relids are foreign-scan path managed
> by same foreign-server, or not, regardless of innerpath/outerpath.
> Of course, it is a significant factor for extensions (including FDW
> driver) whether hook allows to utilize a common infrastructure (like
> SpecialJoinInfo or join restrictlist, ...).
> 
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <[email protected]>
> 
> --
> Sent via pgsql-hackers mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-v9.5-custom-join.v9.patch
Description: pgsql-v9.5-custom-join.v9.patch

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Reply via email to