On Wed, 20 May 2020 at 12:44, David Rowley <dgrowle...@gmail.com> wrote:

> Hackers,
>
> Over on [1], Heikki mentioned about the usefulness of caching results
> from parameterized subplans so that they could be used again for
> subsequent scans which have the same parameters as a previous scan.
> On [2], I mentioned that parameterized nested loop joins could see
> similar gains with such a cache. I suggested there that instead of
> adding code that only allows this to work for subplans, that instead,
> we add a new node type that can handle the caching for us.  We can
> then just inject that node type in places where it seems beneficial.
>

Very cool


> I've attached a patch which implements this.  The new node type is
> called "Result Cache".  I'm not particularly wedded to keeping that
> name, but if I change it, I only want to do it once. I've got a few
> other names I mind, but I don't feel strongly or confident enough in
> them to go and do the renaming.
>
> How the caching works:
>
> First off, it's only good for plugging in on top of parameterized
> nodes that are rescanned with different parameters. The cache itself
> uses a hash table using the simplehash.h implementation.  The memory
> consumption is limited to work_mem. The code maintains an LRU list and
> when we need to add new entries but don't have enough space to do so,
> we free off older items starting at the top of the LRU list.  When we
> get a cache hit, we move that entry to the end of the LRU list so that
> it'll be the last to be evicted.
>
> When should we cache:
>
> For nested loop joins, the decision is made purely based on cost.


I thought the main reason to do this was the case when the nested loop
subplan was significantly underestimated and we realize during execution
that we should have built a hash table. So including this based on cost
alone seems to miss a trick.


> The patch does rely heavily on good ndistinct estimates.


Exactly. We know we seldom get those with many-way joins.

So +1 for adding this technique. My question is whether it should be added
as an optional facility of a parameterised sub plan, rather than an
always-needed full-strength node. That way the choice of whether to use it
can happen at execution time once we notice that we've been called too many
times.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
Mission Critical Databases

Reply via email to