On 24 March 2014 10:25, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote: > Brief summary of the current approach that has been revised from my > original submission through the discussion on pgsql-hackers: > > The plannode was renamed to CustomPlan, instead of CustomScan, because > it dropped all the hardcoded portion that assumes the custom-node shall > perform as alternative scan or join method, because it prevents this > custom-node to perform as other stuff; like sort or append potentially. > According to the suggestion by Tom, I put a structure that contains > several function pointers on the new CustomPlan node, and extension will > allocate an object that extends CustomPlan node. > It looks like polymorphism in object oriented programming language. > The core backend knows abstracted set of methods defined in the > tables of function pointers, and extension can implement its own logic > on the callback, using private state on the extended object.
I just wanted to add some review comments here. I also apologise for not reviewing this earlier; I had misunderstood the maturity of the patch and had assumed it was a request for comments/WIP. Overall, I very much support the concept of providing for alternate scans. I like the placement of calls in the optimizer and we'll be able to do much with that. Other comments in order that I consider them important. * There is no declarative structure for this at all. I was expecting to see a way to declare that a specific table might have an alternate scan path, but we just call the plugin always and it has to separately make a cache lookup to see if anything extra is needed. The Index AM allows us to perform scans, yet indexes are very clearly declared and easily and clearly identifiable. We need the same thing for alternate plans. * There is no mechanism at all for maintaining other data structures. Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there is disturbing, though I could be simply missing something big and obvious. * There is no catalog support. Complex APIs in Postgres typically have a structure like pg_am which allows the features to be clearly identified. I'd be worried about our ability to keep track of so many calls in such pivotal places without that. I want to be able to know what a plugin is doing, especially when it will likely come in binary form. I don't see an easy way to have plugins partially override each other or work together. What happens when I want to use Mr.X's clever new join plugin at the same time as Mr.Y's GPU accelerator? * How do we control security? What stops the Custom Scan API from overriding privileges? Shouldn't the alternate data structures be recognised as objects so we can grant privileges? Or do we simply say if an alternate data structure is linked to a heap then has implied privileges. It would be a shame to implement better security in one patch and then ignore it in another (from the same author). All of the above I can let pass in this release, but in the longer term we need to look for more structure around these ideas so we can manage and control what happens. The way this is now is quite raw - suitable for R&D but not for longer term production usage by a wider audience, IMHO. I wouldn't like to make commitments about the longevity of this API either; if we accept it, it should have a big "may change radically" sign hanging on it. Having said that, I am interested in progress here and I accept that things will look like this at this stage of the ideas process, so these are not things to cause delay. Some things I would like to see change on in this release are... * It's not clear to me when we load/create the alternate data structures. That can't happen at _init time. I was expecting this to look like an infrastructure for unlogged indexes, but it doesn't look like that either. * The name Custom makes me nervous. It sounds too generic, as if the design or objectives for this is are a little unclear. AlternateScan sounds like a better name since its clearer that we are scanning an alternate data structure rather than the main heap. * The prune hook makes me feel very uneasy. It seems weirdly specific implementation detail, made stranger by the otherwise lack of data maintenance API calls. Calling that for every dirty page sounds like an issue and my patch rejection indicator is flashing red around that. Two additional use cases I will be looking to explore will be * Code to make Mat Views recognised as alternate scan targets * Code to allow queries access to sampled data rather the fully detailed data, if the result would be within acceptable tolerance for user -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers