Hanada-san, > I reviewed the Custom/Foreign join API patch again after writing a patch of > join > push-down support for postgres_fdw. > Thanks for your dedicated jobs, my comments are inline below.
> Here, please let me summarize the changes in the patch as the result of my > review. > > * Add set_join_pathlist_hook_type in add_paths_to_joinrel > This hook is intended to provide a chance to add one or more CustomPaths for > an > actual join combination. If the join is reversible, the hook is called for > both > A * B and B * A. This is different from FDW API but it seems fine because > FDWs > should have chances to process the join in more abstract level than CSPs. > > Parameters are same as hash_inner_and_outer, so they would be enough for > hash-like > or nestloop-like methods. I’m not sure whether mergeclause_list is necessary > as a parameter or not. It’s information for merge join which is generated > when > enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it > for processing a join in its own way? Then it must be in parameter list > because > select_mergejoin_clauses is static so it’s not accessible from external > modules. > I think, a preferable way is to reproduce the mergeclause_list by extension itself, rather than pass it as a hook argument, because it is uncertain whether CSP should follow "enable_mergejoin" parameter even if it implements a logic like merge-join. Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight- forward way. > The timing of the hooking, after considering all built-in path types, seems > fine > because some of CSPs might want to use built-in paths as a template or a > source. > > One concern is in the document of the hook function. "Implementing Custom > Paths” > says: > > > A custom scan provider will be also able to add paths by setting the > > following > hook, to replace built-in join paths by custom-scan that performs as if a scan > on preliminary joined relations, which us called after the core code has > generated > what it believes to be the complete and correct set of access paths for the > join. > > I think “replace” would mis-lead readers that CSP can remove or edit existing > built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. > IIUC > CSP can just add paths for the join relation, and planner choose it if it’s > the > cheapest. > I adjusted the documentation stuff as follows: A custom scan provider will be also able to add paths by setting the following hook, to add <literal>CustomPath</> nodes that perform as if built-in join logic doing. It is typically expected to take two input relations then generate a joined output stream, or just scans preliminaty joined relations like materialized-view. This hook is called next to the consideration of core join logics, then planner will choose the best path to run the relations join in the built-in and custom ones. Probably, it can introduce what this hook works correctly. v12 patch updated only this portion. > * Add new FDW API GetForeignJoinPaths in make_join_rel > This FDW API is intended to provide a chance to add ForeignPaths for a join > relation. > This is called only once for a join relation, so FDW should consider reversed > combination if it’s meaningful in their own mechanisms. > > Note that this is called only when the join relation was *NOT* found in the > PlannerInfo, to avoid redundant calls. > Yep, it is designed according to the discussion upthreads. It can produce N-way remote join paths even if intermediate join relation is more expensive than local join + two foreign scan. > Parameters seems enough for postgres_fdw to process N-way join on remote side > with pushing down join conditions and remote filters. > You ensured it clearly. > * Treat scanrelid == 0 as pseudo scan > A foreign/custom join is represented by a scan against a pseudo relation, i.e. > result of a join. Usually Scan has valid scanrelid, oid of a relation being > scanned, and many functions assume that it’s always valid. The patch adds > another > code paths for scanrelid == 0 as custom/foreign join scans. > Right, > * Pseudo scan target list support > CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, > for > column reference tracking. A scan generated for custom/foreign join would > have > column from multiple relations in its target list, i.e. output columns. > Ordinary > scans have all valid columns of the relation as output, so references to them > can be resolved easily, but we need an additional mechanism to determine where > a reference in a target list of custom/foreign scan come from. This is very > similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an > indirect > reference to another relation’s var. > Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner which columns of relations are referenced, and which attribute represents what columns/relations. It is an interface contract when foreign/custom-scan is chosen instead of the built-in join logic. > For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of > CustomScan > and ForeignScan. For this change, new BitmapSet function bms_shift_members is > added. > > set_deparse_planstate is also changed to pass ps_tlist as namespace for > deparsing. > Yep, it is same as IndexOnlyScan. > These chanes seems reasonable, so I mark this patch as “ready for committers” > to hear committers' thoughts. > Thanks! -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com>
pgsql-v9.5-custom-join.v12.patch
Description: pgsql-v9.5-custom-join.v12.patch
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers