Re: [HACKERS] [v9.5] Custom Plan API

Kohei KaiGai Fri, 22 Aug 2014 18:49:13 -0700

2014-08-23 0:39 GMT+09:00 Robert Haas <[email protected]>:
> On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <[email protected]> wrote:
>> Alvaro Herrera <[email protected]> writes:
>>> I haven't followed this at all, but I just skimmed over it and noticed
>>> the CustomPlanMarkPos thingy; apologies if this has been discussed
>>> before.  It seems a bit odd to me; why isn't it sufficient to have a
>>> boolean flag in regular CustomPlan to indicate that it supports
>>> mark/restore?
>>
>> Yeah, I thought that was pretty bogus too, but it's well down the
>> list of issues that were there last time I looked at this ...
>
> I think the threshold question for this incarnation of the patch is
> whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
> a way of installing new plan providers into the database.  If we are,
> then I can go ahead and enumerate a long list of things that will need
> to be fixed to make that code acceptable (such as adding pg_dump
> support).  But if we're not, there's no point in spending any time on
> that part of the patch.
>
Even though I'm patch author, I'd like to agree with this approach.
In fact, the previous custom-plan interface I proposed to the v9.4
development cycle does not include DDL support to register the
custom-plan providers, however, it works fine.


One thing I was pointed out, it is the reason why I implemented
DDL support, is that intermediation of c-language function also
loads the extension module implicitly. It is an advantage, but
not sure whether it shall be supported from the beginning.

> I can see a couple of good reasons to think that this approach might
> be reasonable:
>
> - In some ways, a custom plan provider (really, at this point, a
> custom scan provider) is very similar to a foreign data wrapper.  To
> the guts of PostgreSQL, an FDW is a sort of black box that knows how
> to scan some data not managed by PostgreSQL.  A custom plan provider
> is similar, except that it scans data that *is* managed by PostgreSQL.
>
> - There's also some passing resemblance between a custom plan provider
> and an access method.  Access methods provide a set of APIs for fast
> access to data via an index, while custom plan providers provide an
> API for fast access to data via magic that the core system doesn't
> (and need not) understand.  While access methods don't have associated
> SQL syntax, they do include catalog structure, so perhaps this should
> too, by analogy.
>
> All that having been said, I'm having a hard time mustering up
> enthusiasm for this way of doing things.  As currently constituted,
> the pg_custom_plan_provider catalog contains only a name, a "class"
> that is always 's' for scan, and a handler function OID.  Quite
> frankly, that's a whole lot of nothing.  If we got rid of the
> pg_catalog structure and just had something like
> RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
> which could be invoked from _PG_init(), hundreds and hundreds of lines
> of code could go away and we wouldn't lose any actual functionality;
> you'd just list your custom plan providers in shared_preload_libraries
> or local_preload_libraries instead of listing them in a system
> catalog.  In fact, you might even have more functionality, because you
> could load providers into particular sessions rather than system-wide,
> which isn't possible with this design.
>
Indeed. It's an advantage of the approach without system catalog.


> I think the underlying issue here really has to do with when custom
> plan providers get invoked - what triggers that?  For foreign data
> wrappers, we have some relations that are plain tables (relkind = 'r')
> and no foreign data wrapper code is invoked.  We have others that are
> flagged as foreign tables (relkind = 'f') and for those we look up the
> matching FDW (via ftserver) and run the code.  Similarly, for an index
> AM, we notice that the relation is an index (relkind = 'r') and then
> consult relam to figure out which index AM we should invoke.  But as
> KaiGai is conceiving this feature, it's quite different.  Rather than
> applying only to particular relations, and being mutually exclusive
> with other options that might apply to those relations, it applies to
> *everything* in the database in addition to whatever other options may
> be present.  The included ctidscan implementation is just as good an
> example as PG-Strom: you inspect the query and see, based on the
> operators present, whether there's any hope of accelerating things.
> In other words, there's no user configuration - and also, not
> irrelevantly, no persistent on-disk state the way you have for an
> index, or even an FDW, which has on disk state to the extent that
> there have to be catalog entries tying a particular FDW to a
> particular table.
>
Yes, that's my point. In case of FDW or index AM, the query planner
can have some expectations how relevant executor node will handle
the given relation scan according to the persistent state.
However, custom-plan is a black-box to the query planner, and it
cannot have expectation of how relation scan is handled, except for
the cost value estimated by custom-plan provider.
Thus, this interface is designed to invoke custom-plan providers being
registered on relation scan, to ask them whether it can offer alternative
way to scan.

Probably, it may have a shortcut to skip invocation in case when custom-
plan provider obviously cannot provide any alternative plan.
For example, we may add a flag to RegisterCustomPlanProvider() to
tell this custom-plan provider works on only relkind='r', thus no need to
invoke on other relation types.

> A lot of the previous discussion of this topic revolves around the
> question of whether we can unify the use case that this patch is
> targeting with other things - e.g. Citus's desire to store its data
> files within the data directory while retaining control over data
> access, thus not a perfect fit for FDWs; the desire to push joins down
> to foreign servers; more generally, the desire to replace a join with
> a custom plan that may or may not use access paths for the underlying
> relations as subpaths.  I confess I'm not seeing a whole lot of
> commonality with anything other than the custom-join-path idea, which
> probably shares many of what I believe to be the relevant
> characteristics of a custom scan as conceived by KaiGai: namely, that
> all of the decisions about whether to inject a custom path in
> particular circumstances are left up to the provider itself based on
> inspection of the specific query, rather than being a result of user
> configuration.
>
> So, I'm tentatively in favor of stripping the DDL support out of this
> patch and otherwise trying to boil it down to something that's really
> minimal, but I'd like to hear what others think.
>
I'd like to follow this direction, and start stripping the DDL support.

Thanks,
-- 
KaiGai Kohei <[email protected]>


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

Reply via email to