On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvhe...@2ndquadrant.com> writes: >> I haven't followed this at all, but I just skimmed over it and noticed >> the CustomPlanMarkPos thingy; apologies if this has been discussed >> before. It seems a bit odd to me; why isn't it sufficient to have a >> boolean flag in regular CustomPlan to indicate that it supports >> mark/restore? > > Yeah, I thought that was pretty bogus too, but it's well down the > list of issues that were there last time I looked at this ...
I think the threshold question for this incarnation of the patch is whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as a way of installing new plan providers into the database. If we are, then I can go ahead and enumerate a long list of things that will need to be fixed to make that code acceptable (such as adding pg_dump support). But if we're not, there's no point in spending any time on that part of the patch. I can see a couple of good reasons to think that this approach might be reasonable: - In some ways, a custom plan provider (really, at this point, a custom scan provider) is very similar to a foreign data wrapper. To the guts of PostgreSQL, an FDW is a sort of black box that knows how to scan some data not managed by PostgreSQL. A custom plan provider is similar, except that it scans data that *is* managed by PostgreSQL. - There's also some passing resemblance between a custom plan provider and an access method. Access methods provide a set of APIs for fast access to data via an index, while custom plan providers provide an API for fast access to data via magic that the core system doesn't (and need not) understand. While access methods don't have associated SQL syntax, they do include catalog structure, so perhaps this should too, by analogy. All that having been said, I'm having a hard time mustering up enthusiasm for this way of doing things. As currently constituted, the pg_custom_plan_provider catalog contains only a name, a "class" that is always 's' for scan, and a handler function OID. Quite frankly, that's a whole lot of nothing. If we got rid of the pg_catalog structure and just had something like RegisterCustomPlanProvider(char *name, void (*)(customScanArg *), which could be invoked from _PG_init(), hundreds and hundreds of lines of code could go away and we wouldn't lose any actual functionality; you'd just list your custom plan providers in shared_preload_libraries or local_preload_libraries instead of listing them in a system catalog. In fact, you might even have more functionality, because you could load providers into particular sessions rather than system-wide, which isn't possible with this design. I think the underlying issue here really has to do with when custom plan providers get invoked - what triggers that? For foreign data wrappers, we have some relations that are plain tables (relkind = 'r') and no foreign data wrapper code is invoked. We have others that are flagged as foreign tables (relkind = 'f') and for those we look up the matching FDW (via ftserver) and run the code. Similarly, for an index AM, we notice that the relation is an index (relkind = 'r') and then consult relam to figure out which index AM we should invoke. But as KaiGai is conceiving this feature, it's quite different. Rather than applying only to particular relations, and being mutually exclusive with other options that might apply to those relations, it applies to *everything* in the database in addition to whatever other options may be present. The included ctidscan implementation is just as good an example as PG-Strom: you inspect the query and see, based on the operators present, whether there's any hope of accelerating things. In other words, there's no user configuration - and also, not irrelevantly, no persistent on-disk state the way you have for an index, or even an FDW, which has on disk state to the extent that there have to be catalog entries tying a particular FDW to a particular table. A lot of the previous discussion of this topic revolves around the question of whether we can unify the use case that this patch is targeting with other things - e.g. Citus's desire to store its data files within the data directory while retaining control over data access, thus not a perfect fit for FDWs; the desire to push joins down to foreign servers; more generally, the desire to replace a join with a custom plan that may or may not use access paths for the underlying relations as subpaths. I confess I'm not seeing a whole lot of commonality with anything other than the custom-join-path idea, which probably shares many of what I believe to be the relevant characteristics of a custom scan as conceived by KaiGai: namely, that all of the decisions about whether to inject a custom path in particular circumstances are left up to the provider itself based on inspection of the specific query, rather than being a result of user configuration. So, I'm tentatively in favor of stripping the DDL support out of this patch and otherwise trying to boil it down to something that's really minimal, but I'd like to hear what others think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers