On Tue, 11 Aug 2020 at 17:44, Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2020-08-11 17:23:42 +1200, David Rowley wrote: > > On Tue, 11 Aug 2020 at 12:21, Andres Freund <and...@anarazel.de> wrote: > > > > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote: > > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund <and...@anarazel.de> wrote: > > > > > I'm not convinced it's a good idea to introduce a separate executor > > > > > node > > > > > for this. There's a fair bit of overhead in them, and they will only > > > > > be > > > > > below certain types of nodes afaict. It seems like it'd be better to > > > > > pull the required calls into the nodes that do parametrized scans of > > > > > subsidiary nodes. Have you considered that? > > > > > > > > I see 41 different node types mentioned in ExecReScan(). I don't > > > > really think it would be reasonable to change all those. > > > > > > But that's because we dispatch ExecReScan mechanically down to every > > > single executor node. That doesn't determine how many nodes would need > > > to modify to include explicit caching? What am I missing? > > > > > > Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c > > > integration? > > > > hmm, I think you're right there about those two node types. I'm just > > not sure you're right about overloading these node types to act as a > > cache. > > I'm not 100% either, to be clear. I am just acutely aware that adding > entire nodes is pretty expensive, and that there's, afaict, no need to > have arbitrary (i.e. pointer to function) type callbacks to point to the > cache.
Perhaps you're right, but I'm just not convinced of it. I feel there's a certain air of magic involved in any node that has a good name and reputation for doing one thing that we suddenly add new functionality to which causes it to perform massively differently. A counterexample to your argument is that Materialize is a node type. There's only a limits number of places where that node is used. One of those places can be on the inside of a non-parameterized nested loop. Your argument of having Nested Loop do caching would also indicate that Materialize should be part of Nested Loop instead of a node itself. There's a few other places Materialize is used, e.g scrollable cursors, but in that regard, you could say that the caching should be handled in ExecutePlan(). I just don't think it should be, as I don't think Result Cache should be part of any other node or code. Another problem I see with overloading nodeSubplan and nodeNestloop is, we don't really document our executor nodes today, so unless this patch starts a new standard for that, then there's not exactly a good place to mention that parameterized nested loops may now cache results from the inner scan. I do understand what you mean with the additional node overhead. I saw that in my adventures of INNER JOIN removals a few years ago. I hope the fact that I've tried to code the planner so that for nested loops, it only uses a Result Cache node when it thinks it'll speed things up. That decision is of course based on having good statistics, which might not be the case. I don't quite have that luxury with subplans due to lack of knowledge of the outer plan when planning the subquery. > > How would you inform users via EXPLAIN ANALYZE of how many > > cache hits/misses occurred? > > Similar to how we display memory for sorting etc. I was more thinking of how bizarre it would be to see Nested Loop and SubPlan report cache statistics. It may appear quite magical to users to see EXPLAIN ANALYZE mentioning that their Nested Loop is now reporting something about cache hits. > > What would you use to disable it for an > > escape hatch for when the planner makes a bad choice about caching? > > Isn't that *easier* when embedding it into the node? There's no nice way > to remove an intermediary executor node entirely, but it's trivial to > have an if statement like > if (node->cache && upsert_cache(node->cache, param)) I was more meaning that it might not make sense to keep the enable_resultcache GUC if the caching were part of the existing nodes. I think people are pretty used to the enable_* GUCs corresponding to an executor whose name roughly matches the name of the GUC. In this case, without a Result Cache node, enable_resultcache would not assist in self-documenting. However, perhaps 2 new GUCs instead, enable_nestloop_caching and enable_subplan_caching. We're currently short of any other enable_* GUCs that are node modifiers. We did have enable_hashagg_disk until a few weeks ago. Nobody seemed to like that, but perhaps there were other reasons for people not to like it other than it was a node modifier GUC. I'm wondering if anyone else has any thoughts on this? David