On Wed, 20 May 2020 at 12:44, David Rowley <dgrowle...@gmail.com> wrote:
> Hackers, > > Over on [1], Heikki mentioned about the usefulness of caching results > from parameterized subplans so that they could be used again for > subsequent scans which have the same parameters as a previous scan. > On [2], I mentioned that parameterized nested loop joins could see > similar gains with such a cache. I suggested there that instead of > adding code that only allows this to work for subplans, that instead, > we add a new node type that can handle the caching for us. We can > then just inject that node type in places where it seems beneficial. > Very cool > I've attached a patch which implements this. The new node type is > called "Result Cache". I'm not particularly wedded to keeping that > name, but if I change it, I only want to do it once. I've got a few > other names I mind, but I don't feel strongly or confident enough in > them to go and do the renaming. > > How the caching works: > > First off, it's only good for plugging in on top of parameterized > nodes that are rescanned with different parameters. The cache itself > uses a hash table using the simplehash.h implementation. The memory > consumption is limited to work_mem. The code maintains an LRU list and > when we need to add new entries but don't have enough space to do so, > we free off older items starting at the top of the LRU list. When we > get a cache hit, we move that entry to the end of the LRU list so that > it'll be the last to be evicted. > > When should we cache: > > For nested loop joins, the decision is made purely based on cost. I thought the main reason to do this was the case when the nested loop subplan was significantly underestimated and we realize during execution that we should have built a hash table. So including this based on cost alone seems to miss a trick. > The patch does rely heavily on good ndistinct estimates. Exactly. We know we seldom get those with many-way joins. So +1 for adding this technique. My question is whether it should be added as an optional facility of a parameterised sub plan, rather than an always-needed full-strength node. That way the choice of whether to use it can happen at execution time once we notice that we've been called too many times. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> Mission Critical Databases