KaiGai
On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kai...@kaigai.gr.jp> wrote: > Thanks for your review. > > 2013/11/19 Jim Mlodgenski <jimm...@gmail.com>: > > My initial review on this feature: > > - The patches apply and build, but it produces a warning: > > ctidscan.c: In function ‘CTidInitCustomScanPlan’: > > ctidscan.c:362:9: warning: unused variable ‘scan_relid’ > [-Wunused-variable] > > > This variable was only used in Assert() macro, so it causes a warning if > you > don't put --enable-cassert on the configure script. > Anyway, I adjusted the code to check relid of RelOptInfo directly. > The warning is now gone. > > I'd recommend that you split the part1 patch containing the ctidscan > contrib > > into its own patch. It is more than half of the patch and its certainly > > stands on its own. IMO, I think ctidscan fits a very specific use case > and > > would be better off being an extension instead of in contrib. > > > OK, I split them off. The part-1 is custom-scan API itself, the part-2 is > ctidscan portion, and the part-3 is remote join on postgres_fdw. > Attached is a patch for the documentation. I think the documentation still needs a little more work, but it is pretty close. I can add some more detail to it once finish adapting the hadoop_fdw to using the custom scan api and have a better understanding of all of the calls. > Thanks, > -- > KaiGai Kohei <kai...@kaigai.gr.jp> >
*** a/doc/src/sgml/custom-scan.sgml 2013-11-18 17:50:02.652039003 -0500 --- b/doc/src/sgml/custom-scan.sgml 2013-11-22 09:09:13.624254649 -0500 *************** *** 8,47 **** <secondary>handler for</secondary> </indexterm> <para> ! Custom-scan API enables extension to provide alternative ways to scan or ! join relations, being fully integrated with cost based optimizer, ! in addition to the built-in implementation. ! It consists of a set of callbacks, with a unique name, to be invoked during ! query planning and execution. Custom-scan provider should implement these ! callback functions according to the expectation of API. </para> <para> ! Overall, here is four major jobs that custom-scan provider should implement. ! The first one is registration of custom-scan provider itself. Usually, it ! shall be done once at <literal>_PG_init()</literal> entrypoint on module ! loading. ! The other three jobs shall be done for each query planning and execution. ! The second one is submission of candidate paths to scan or join relations, ! with an adequate cost, for the core planner. ! Then, planner shall chooses a cheapest path from all the candidates. ! If custom path survived, the planner kicks the third job; construction of ! <literal>CustomScan</literal> plan node, being located within query plan ! tree instead of the built-in plan node. ! The last one is execution of its implementation in answer to invocations ! by the core executor. </para> <para> ! Some of contrib module utilize the custom-scan API. It may be able to ! provide a good example for new development. <variablelist> <varlistentry> <term><xref linkend="ctidscan"></term> <listitem> <para> ! Its logic enables to skip earlier pages or terminate scan prior to ! end of the relation, if inequality operator on <literal>ctid</literal> ! system column can narrow down the scope to be scanned, instead of ! the sequential scan that reads a relation from the head to the end. </para> </listitem> </varlistentry> --- 8,46 ---- <secondary>handler for</secondary> </indexterm> <para> ! The custom-scan API enables an extension to provide alternative ways to scan ! or join relations leveraging the cost based optimizer. The API consists of a ! set of callbacks, with a unique names, to be invoked during query planning ! and execution. A custom-scan provider should implement these callback ! functions according to the expectation of the API. </para> <para> ! Overall, there are four major tasks that a custom-scan provider should ! implement. The first task is the registration of custom-scan provider itself. ! Usually, this needs to be done once at the <literal>_PG_init()</literal> ! entrypoint when the module is loading. The remaing three tasks are all done ! when a query is planning and executing. The second task is the submission of ! candidate paths to either scan or join relations with an adequate cost for ! the core planner. Then, the planner will choose the cheapest path from all of ! the candidates. If the custom path survived, the planner starts the third ! task; construction of a <literal>CustomScan</literal> plan node, located ! within the query plan tree instead of the built-in plan node. The last task ! is the execution of its implementation in answer to invocations by the core ! executor. </para> <para> ! Some of contrib modules utilize the custom-scan API. They may provide a good ! example for new development. <variablelist> <varlistentry> <term><xref linkend="ctidscan"></term> <listitem> <para> ! This custom scan in this module enables a scan to skip earlier pages or ! terminate prior to end of the relation, if the inequality operator on the ! <literal>ctid</literal> system column can narrow down the scope to be ! scanned, instead of a sequential scan which reads a relation from the ! head to the end. </para> </listitem> </varlistentry> *************** *** 49,70 **** <term><xref linkend="postgres-fdw"></term> <listitem> <para> ! Its logic replaces a local join of foreign tables managed by ! <literal>postgres_fdw</literal> with a custom scan that fetches ! remotely joined relations. ! It shows the way to implement a custom scan node that performs ! instead join nodes. </para> </listitem> </varlistentry> </variablelist> </para> <para> ! Right now, only scan and join are supported to have fully integrated cost ! based query optimization performing on custom scan API. ! You might be able to implement other stuff, like sort or aggregation, with ! manipulation of the planned tree, however, extension has to be responsible ! to handle this replacement correctly. Here is no support by the core. </para> <sect1 id="custom-scan-spec"> --- 48,68 ---- <term><xref linkend="postgres-fdw"></term> <listitem> <para> ! This custom scan in this module replaces a local join of foreign tables ! managed by <literal>postgres_fdw</literal> with a scan that fetches ! remotely joined relations. It demostrates the way to implement a custom ! scan node that performs join nodes. </para> </listitem> </varlistentry> </variablelist> </para> <para> ! Currently, only scan and join are fully supported with integrated cost ! based query optimization using the custom scan API. You might be able to ! implement other stuff, like sort or aggregation, with manipulation of the ! planned tree, however, the extension has to be responsible to handle this ! replacement correctly. There is no support in the core. </para> <sect1 id="custom-scan-spec"> *************** *** 72,80 **** <sect2 id="custom-scan-register"> <title>Registration of custom scan provider</title> <para> ! The first job for custom scan provider is registration of a set of ! callbacks with a unique name. Usually, it shall be done once on ! <literal>_PG_init()</literal> entrypoint of module loading. <programlisting> void register_custom_provider(const CustomProvider *provider); --- 70,78 ---- <sect2 id="custom-scan-register"> <title>Registration of custom scan provider</title> <para> ! The first task for a custom scan provider is the registration of a set of ! callbacks with a unique names. Usually, this is done once upon module ! loading in the <literal>_PG_init()</literal> entrypoint. <programlisting> void register_custom_provider(const CustomProvider *provider); *************** *** 90,105 **** <sect2 id="custom-scan-path"> <title>Submission of custom paths</title> <para> ! The query planner finds out the best way to scan or join relations from ! the various potential paths; combination of a scan algorithm and target ! relations. ! Prior to this selection, we list up all the potential paths towards ! a target relation (if base relation) or a pair of relations (if join). ! The <literal>add_scan_path_hook</> and <literal>add_join_path_hook</> ! allows extensions to add alternative scan paths in addition to built-in ! ones. If custom-scan provider can submit a potential scan path towards the ! supplied relation, it shall construct <literal>CustomPath</> object with appropriate parameters. <programlisting> typedef struct CustomPath --- 88,102 ---- <sect2 id="custom-scan-path"> <title>Submission of custom paths</title> <para> ! The query planner finds the best way to scan or join relations from various ! potential paths using a combination of scan algorithms and target ! relations. Prior to this selection, we list all of the potential paths ! towards a target relation (if it is a base relation) or a pair of relations ! (if it is a join). The <literal>add_scan_path_hook</> and ! <literal>add_join_path_hook</> allow extensions to add alternative scan ! paths in addition to built-in paths. If custom-scan provider can submit a potential scan path towards the ! supplied relation, it shall construct a <literal>CustomPath</> object with appropriate parameters. <programlisting> typedef struct CustomPath *************** *** 110,118 **** List *custom_private; /* can be used for private data */ } CustomPath; </programlisting> ! Its <literal>path</> is common field for all the path nodes to store ! cost estimation. In addition, <literal>custom_name</> is the name of ! registered custom scan provider, <literal>custom_flags</> is a set of flags below, and <literal>custom_private</> can be used to store private data of the custom scan provider. </para> --- 107,115 ---- List *custom_private; /* can be used for private data */ } CustomPath; </programlisting> ! Its <literal>path</> is a common field for all the path nodes to store ! a cost estimation. In addition, <literal>custom_name</> is the name of ! the registered custom scan provider, <literal>custom_flags</> is a set of flags below, and <literal>custom_private</> can be used to store private data of the custom scan provider. </para> *************** *** 125,132 **** It informs the query planner this custom scan node supports <literal>ExecMarkPosCustomScan</> and <literal>ExecRestorePosCustomScan</> methods. ! Also, custom scan provider has to be responsible to mark and restore ! a particular position. </para> </listitem> </varlistentry> --- 122,129 ---- It informs the query planner this custom scan node supports <literal>ExecMarkPosCustomScan</> and <literal>ExecRestorePosCustomScan</> methods. ! Also, the custom scan provider has to be responsible to mark and ! restore a particular position. </para> </listitem> </varlistentry> *************** *** 135,141 **** <listitem> <para> It informs the query planner this custom scan node supports ! backward scan. Also, custom scan provider has to be responsible to scan with backward direction. </para> --- 132,138 ---- <listitem> <para> It informs the query planner this custom scan node supports ! backward scans. Also, custom scan provider has to be responsible to scan with backward direction. </para> *************** *** 148,157 **** <sect2 id="custom-scan-plan"> <title>Construction of custom plan node</title> <para> ! Once <literal>CustomPath</literal> got choosen by query planner, ! it calls back its associated custom scan provider to complete setting ! up <literal>CustomScan</literal> plan node according to the path ! information. <programlisting> void InitCustomScanPlan(PlannerInfo *root, --- 145,154 ---- <sect2 id="custom-scan-plan"> <title>Construction of custom plan node</title> <para> ! Once <literal>CustomPath</literal> was choosen by the query planner, ! it calls back to its associated to the custom scan provider to complete ! setting up the <literal>CustomScan</literal> plan node according to the ! path information. <programlisting> void InitCustomScanPlan(PlannerInfo *root, *************** *** 160,180 **** List *tlist, List *scan_clauses); </programlisting> ! Query planner does basic initialization on the <literal>cscan_plan</> ! being allocated, then custom scan provider can apply final initialization. ! <literal>cscan_path</> is the path node that was constructed on the ! previous stage then got choosen. <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned on the <literal>Plan</> portion in the <literal>cscan_plan</>. Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to ! be checked during relation scan. Its expression portion shall be also assigned on the <literal>Plan</> portion, but can be eliminated from this list if custom scan provider can handle these checks by itself. </para> <para> It often needs to adjust <literal>varno</> of <literal>Var</> node that ! references a particular scan node, after conscruction of plan node. ! For example, Var node in the target list of join node originally references a particular relation underlying a join, however, it has to be adjusted to either inner or outer reference. <programlisting> --- 157,177 ---- List *tlist, List *scan_clauses); </programlisting> ! The query planner does basic initialization on the <literal>cscan_plan</> ! being allocated, then the custom scan provider can apply final ! initialization. <literal>cscan_path</> is the path node that was ! constructed on the previous stage then was choosen. <literal>tlist</> is a list of <literal>TargetEntry</> to be assigned on the <literal>Plan</> portion in the <literal>cscan_plan</>. Also, <literal>scan_clauses</> is a list of <literal>RestrictInfo</> to ! be checked during a relation scan. Its expression portion will also be assigned on the <literal>Plan</> portion, but can be eliminated from this list if custom scan provider can handle these checks by itself. </para> <para> It often needs to adjust <literal>varno</> of <literal>Var</> node that ! references a particular scan node, after construction of the plan node. ! For example, Var node in the target list of the join node originally references a particular relation underlying a join, however, it has to be adjusted to either inner or outer reference. <programlisting> *************** *** 183,191 **** CustomScan *cscan_plan, int rtoffset); </programlisting> ! This callback is optional if custom scan node is a vanilla relation ! scan because here is nothing special to do. Elsewhere, it needs to ! be handled by custom scan provider in case when a custom scan replaced a join with two or more relations for example. </para> </sect2> --- 180,188 ---- CustomScan *cscan_plan, int rtoffset); </programlisting> ! This callback is optional if the custom scan node is a vanilla relation ! scan because there is nothing special to do. Elsewhere, it needs to ! be handled by the custom scan provider in case when a custom scan replaced a join with two or more relations for example. </para> </sect2> *************** *** 193,200 **** <sect2 id="custom-scan-exec"> <title>Execution of custom scan node</title> <para> ! Query execuror also launches associated callbacks to begin, execute and ! end custom scan according to the executor's manner. </para> <para> <programlisting> --- 190,197 ---- <sect2 id="custom-scan-exec"> <title>Execution of custom scan node</title> <para> ! The query executor also launches the associated callbacks to begin, execute ! and end the custom scan according to the executor's manner. </para> <para> <programlisting> *************** *** 202,217 **** BeginCustomScan(CustomScanState *csstate, int eflags); </programlisting> It begins execution of the custom scan on starting up executor. ! It allows custom scan provider to do any initialization job around this ! plan, however, it is not a good idea to launch actual scanning jobs. (It shall be done on the first invocation of <literal>ExecCustomScan</> instead.) The <literal>custom_state</> field of <literal>CustomScanState</> is ! intended to save the private state being managed by custom scan provider. ! Also, <literal>eflags</> has flag bits of the executor's operating mode ! for this plan node. ! Note that custom scan provider should not perform anything visible ! externally if <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given, </para> <para> --- 199,214 ---- BeginCustomScan(CustomScanState *csstate, int eflags); </programlisting> It begins execution of the custom scan on starting up executor. ! It allows the custom scan provider to do any initialization job around this ! plan, however, it is not a good idea to launch the actual scanning jobs. (It shall be done on the first invocation of <literal>ExecCustomScan</> instead.) The <literal>custom_state</> field of <literal>CustomScanState</> is ! intended to save the private state being managed by the custom scan ! provider. Also, <literal>eflags</> has flag bits of the executor's ! operating mode for this plan node. Note that the custom scan provider ! should not perform anything visible externally if ! <literal>EXEC_FLAG_EXPLAIN_ONLY</> would be given, </para> <para> *************** *** 219,229 **** TupleTableSlot * ExecCustomScan(CustomScanState *csstate); </programlisting> ! It fetches one tuple from the underlying relation or relations if join according to the custom logic. Unlike <literal>IterateForeignScan</> ! method in foreign table, it is also responsible to check whether next tuple matches the qualifier of this scan, or not. ! A usual way to implement this method is the callback performs just an entrypoint of <literal>ExecQual</> with its own access method. </para> --- 216,226 ---- TupleTableSlot * ExecCustomScan(CustomScanState *csstate); </programlisting> ! It fetches one tuple from the underlying relation or relations, if joining, according to the custom logic. Unlike <literal>IterateForeignScan</> ! method in foreign table, it is also responsible to check whether the next tuple matches the qualifier of this scan, or not. ! The usual way to implement this method is the callback performs just an entrypoint of <literal>ExecQual</> with its own access method. </para> *************** *** 232,240 **** Node * MultiExecCustomScan(CustomScanState *csstate); </programlisting> ! It fetches multiple tuples from the underlying relation or relations if ! join according to the custom logic. Pay attention the data format (and ! the way to return also) depends on the type of upper node. </para> <para> --- 229,237 ---- Node * MultiExecCustomScan(CustomScanState *csstate); </programlisting> ! It fetches multiple tuples from the underlying relation or relations, if ! joining, according to the custom logic. Pay attention the data format (and ! the way to return also) since it depends on the type of upper node. </para> <para> *************** *** 242,248 **** void EndCustomScan(CustomScanState *csstate); </programlisting> ! It ends the scan and release resources privately allocated. It is usually not important to release memory in per-execution memory context. So, all this callback should be responsible is its own resources regardless from the framework. --- 239,245 ---- void EndCustomScan(CustomScanState *csstate); </programlisting> ! It ends the scan and releases resources privately allocated. It is usually not important to release memory in per-execution memory context. So, all this callback should be responsible is its own resources regardless from the framework. *************** *** 257,263 **** ReScanCustomScan(CustomScanState *csstate); </programlisting> It restarts the current scan from the beginning. ! Note that parameters of the scan depends on might change values, so rewinded scan does not need to return exactly identical tuples. </para> <para> --- 254,260 ---- ReScanCustomScan(CustomScanState *csstate); </programlisting> It restarts the current scan from the beginning. ! Note that parameters of the scan depends on may change values, so rewinded scan does not need to return exactly identical tuples. </para> <para> *************** *** 276,282 **** RestorePosCustom(CustomScanState *csstate); </programlisting> It rewinds the current position of the custom scan to the position ! where <literal>MarkPosCustomScan</> saved before. Note that it is optional to implement, only when <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set. </para> --- 273,279 ---- RestorePosCustom(CustomScanState *csstate); </programlisting> It rewinds the current position of the custom scan to the position ! where <literal>MarkPosCustomScan</> was saved before. Note that it is optional to implement, only when <literal>CUSTOM__SUPPORT_MARK_RESTORE</> is set. </para>
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers