Hanada-san,

> I reviewed the Custom/Foreign join API patch again after writing a patch of 
> join
> push-down support for postgres_fdw.
>
Thanks for your dedicated jobs, my comments are inline below.

> Here, please let me summarize the changes in the patch as the result of my 
> review.
> 
> * Add set_join_pathlist_hook_type in add_paths_to_joinrel
> This hook is intended to provide a chance to add one or more CustomPaths for 
> an
> actual join combination.  If the join is reversible, the hook is called for 
> both
> A * B and B * A.  This is different from FDW API but it seems fine because 
> FDWs
> should have chances to process the join in more abstract level than CSPs.
> 
> Parameters are same as hash_inner_and_outer, so they would be enough for 
> hash-like
> or nestloop-like methods.  I’m not sure whether mergeclause_list is necessary
> as a parameter or not.  It’s information for merge join which is generated 
> when
> enable_mergejoin is on and the join is not FULL OUTER.  Does some CSP need it
> for processing a join in its own way?  Then it must be in parameter list 
> because
> select_mergejoin_clauses is static so it’s not accessible from external 
> modules.
>
I think, a preferable way is to reproduce the mergeclause_list by extension 
itself,
rather than pass it as a hook argument, because it is uncertain whether CSP 
should
follow "enable_mergejoin" parameter even if it implements a logic like 
merge-join.
Of course, it needs to expose select_mergejoin_clauses. It seems to me a 
straight-
forward way.

> The timing of the hooking, after considering all built-in path types, seems 
> fine
> because some of CSPs might want to use built-in paths as a template or a 
> source.
> 
> One concern is in the document of the hook function.  "Implementing Custom 
> Paths”
> says:
> 
> > A custom scan provider will be also able to add paths by setting the 
> > following
> hook, to replace built-in join paths by custom-scan that performs as if a scan
> on preliminary joined relations, which us called after the core code has 
> generated
> what it believes to be the complete and correct set of access paths for the 
> join.
> 
> I think “replace” would mis-lead readers that CSP can remove or edit existing
> built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo.  
> IIUC
> CSP can just add paths for the join relation, and planner choose it if it’s 
> the
> cheapest.
>
I adjusted the documentation stuff as follows:

   A custom scan provider will be also able to add paths by setting the
   following hook, to add <literal>CustomPath</> nodes that perform as
   if built-in join logic doing. It is typically expected to take two
   input relations then generate a joined output stream, or just scans
   preliminaty joined relations like materialized-view. This hook is
   called next to the consideration of core join logics, then planner
   will choose the best path to run the relations join in the built-in
   and custom ones.

Probably, it can introduce what this hook works correctly.
v12 patch updated only this portion.

> * Add new FDW API GetForeignJoinPaths in make_join_rel
> This FDW API is intended to provide a chance to add ForeignPaths for a join 
> relation.
> This is called only once for a join relation, so FDW should consider reversed
> combination if it’s meaningful in their own mechanisms.
> 
> Note that this is called only when the join relation was *NOT* found in the
> PlannerInfo, to avoid redundant calls.
>
Yep, it is designed according to the discussion upthreads.
It can produce N-way remote join paths even if intermediate join relation is
more expensive than local join + two foreign scan.

> Parameters seems enough for postgres_fdw to process N-way join on remote side
> with pushing down join conditions and remote filters.
>
You ensured it clearly.

> * Treat scanrelid == 0 as pseudo scan
> A foreign/custom join is represented by a scan against a pseudo relation, i.e.
> result of a join.  Usually Scan has valid scanrelid, oid of a relation being
> scanned, and many functions assume that it’s always valid.  The patch adds 
> another
> code paths for scanrelid == 0 as custom/foreign join scans.
>
Right,

> * Pseudo scan target list support
> CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, 
> for
> column reference tracking.  A scan generated for custom/foreign join would 
> have
> column from multiple relations in its target list, i.e. output columns.  
> Ordinary
> scans have all valid columns of the relation as output, so references to them
> can be resolved easily, but we need an additional mechanism to determine where
> a reference in a target list of custom/foreign scan come from.  This is very
> similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an 
> indirect
> reference to another relation’s var.
>
Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core 
planner
which columns of relations are referenced, and which attribute represents what
columns/relations. It is an interface contract when foreign/custom-scan is 
chosen
instead of the built-in join logic.

> For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of 
> CustomScan
> and ForeignScan.  For this change, new BitmapSet function bms_shift_members is
> added.
> 
> set_deparse_planstate is also changed to pass ps_tlist as namespace for
> deparsing.
>
Yep, it is same as IndexOnlyScan.

> These chanes seems reasonable, so I mark this patch as “ready for committers”
> to hear committers' thoughts.
>
Thanks!
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>

Attachment: pgsql-v9.5-custom-join.v12.patch
Description: pgsql-v9.5-custom-join.v12.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to