The attached patch changed invocation order of GetForeignJoinPaths and set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml.
Other portions are kept as previous version. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Sunday, March 15, 2015 11:38 AM > To: Robert Haas; Tom Lane > Cc: Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > Robert Haas <robertmh...@gmail.com> writes: > > >> Another bit of this that I think we could commit without fretting > > >> about it too much is the code adding set_join_pathlist_hook. This is > > >> - I think - analogous to set_rel_pathlist_hook, and like that hook, > > >> could be used for other purposes than custom plan generation - e.g. to > > >> delete paths we do not want to use. I've extracted this portion of > > >> the patch and adjusted the comments; if there are no objections, I > > >> will commit this bit also. > > > > > > I don't object to the concept, but I think that is a pretty bad place > > > to put the hook call: add_paths_to_joinrel is typically called multiple > > > (perhaps *many*) times per joinrel and thus this placement would force > > > any user of the hook to do a lot of repetitive work. > > > > Interesting point. I guess the question is whether a some or all > > callers are going to actually *want* a separate call for each > > invocation of add_paths_to_joinrel(), or whether they'll be happy to > > operate on the otherwise-complete path list. It's true that if your > > goal is to delete paths, it's probably best to be called just once > > after the path list is complete, and there might be a use case for > > that, but I guess it's less useful than for baserels. For a baserel, > > as long as you don't nuke the sequential-scan path, there is always > > going to be a way to complete the plan; so this would be a fine way to > > implement a disable-an-index extension. But for joinrels, it's not so > > easy to rule out, say, a hash-join here. Neither hook placement is > > much good for that; the path you want to get rid of may have already > > dominated paths you want to keep. > > > From the standpoint of extension development, I'm uncertain whether we > can easily reproduce information needed to compute alternative paths on > the hook at standard_join_search(), like a hook at add_paths_to_joinrel(). > > (Please correct me, if I misunderstood.) > For example, it is not obvious which path is inner/outer of the joinrel > on which custom-scan provider tries to add an alternative scan path. > Probably, extension needs to find out the path of source relations from > the join_rel_level[] array. > Also, how do we pull SpecialJoinInfo? It contains needed information to > identify required join-type (like JOIN_LEFT), however, extension needs > to search join_info_list by relids again, if hook is located at > standard_join_search(). > Even if number of hook invocation is larger if it is located on > add_paths_to_joinrel(), it allows to design extensions simpler, > I think. > > > Suppose you want to add paths - e.g. you have an extension that goes > > and looks for a materialized view that matches this subtree of the > > query, and if it finds one, it substitutes a scan of the materialized > > view for a scan of the baserel. Or, as in KaiGai's case, you have an > > extension that can perform the whole join in GPU-land and produce the > > same results we would have gotten via normal execution. Either way, > > you want - and this is the central point of the whole patch here - to > > inject a scan path into a joinrel. It is not altogether obvious to me > > what the best placement for this is. In the materialized view case, > > you probably need a perfect match between the baserels in the view and > > the baserels in the joinrel to do anything. There's no point in > > re-checking that for every innerrels/outerrels combination. I don't > > know enough about the GPU case to reason about it intelligently; maybe > > KaiGai can comment. > > > In case of GPU, extension will add alternative paths based on hash-join > and nested-loop algorithm with individual cost estimation as long as > device can execute join condition. It expects planner (set_cheapest) > will choose the best path in the built-in/additional ones. > So, it is more reasonable for me, if extension can utilize a common > infrastructure as built-in logic (hash-join/merge-join/nested-loop) > is using to compute its cost estimation. > > > But there's another possible approach: suppose that > > join_search_one_level, after considering left-sided and right-sided > > joins and after considering bushy joins, checks whether every relation > > it's got is from the same foreign server, and if so, asks that foreign > > server whether it would like to contribute any paths. Would that be > > better or worse? A disadvantage is that if you've got something like > > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > > down (say, each join clause calls a non-pushdown-safe function) you'll > > end up examining a pile of joinrels - at every level of the join tree > > - and individually rejecting each one. With the > > build-it-up-incrementally approach, you'll figure that all out at > > level 2, and then after that there's nothing to do but give up > > quickly. On the other hand, I'm afraid the incremental approach might > > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > > foreign tables on the same server. If the output of the big/huge join > > is big, none of those paths are going to survive at level 2, but the > > overall join size might be very small, so we surely want a chance to > > recover at level 3. (We discussed test cases of this form quite a bit > > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > > > Thoughts? > > > Do we need to pay attention on relids of joinrel, instead of innerpath > and outerpath? Yep, we might assume a path with join pushed-down has > cheaper cost than combination of two foreign-scan and a local join, > however, foreign-scan with join pushed-down may partially have > expensive cost. > In this case, either of hook location may be reasonable, because FDW > driver can check whether all the relids are foreign-scan path managed > by same foreign-server, or not, regardless of innerpath/outerpath. > Of course, it is a significant factor for extensions (including FDW > driver) whether hook allows to utilize a common infrastructure (like > SpecialJoinInfo or join restrictlist, ...). > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kai...@ak.jp.nec.com> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-v9.5-custom-join.v9.patch
Description: pgsql-v9.5-custom-join.v9.patch
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers