Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Andy Fan Tue, 25 May 2021 19:19:48 -0700

On Fri, Mar 12, 2021 at 8:31 AM David Rowley <dgrowle...@gmail.com> wrote:


> Thanks for these suggestions.
>
> On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pry...@telsasoft.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:
> > > To summarise here, the planner performance gets a fair bit worse with
> > > the patched code. With master, summing the average planning time over
> > > each of the queries resulted in a total planning time of 765.7 ms.
> > > After patching, that went up to 1097.5 ms.  I was pretty disappointed
> > > about that.
> >
> > I have a couple ideas;
>

I just checked the latest code,  looks like we didn't improve this
situation except
that we introduced a GUC to control it.   Am I missing something? I don't
have a
suggestion though.


> >  - default enable_resultcache=off seems okay.  In plenty of cases,
> planning
> >    time is unimportant.  This is the "low bar" - if we can do better and
> enable
> >    it by default, that's great.
>
> I think that's reasonable. Teaching the planner to do new tricks is
> never going to make the planner produce plans more quickly. When the
> new planner trick gives us a more optimal plan, then great.  When it
> does not then it's wasted effort.  Giving users the ability to switch
> off the planner's new ability seems like a good way for people who
> continually find it the additional effort costs more than it saves
> seems like a good way to keep them happy.
>
> >  - Maybe this should be integrated into nestloop rather than being a
> separate
> >    plan node.  That means that it could be dynamically enabled during
> >    execution, maybe after a few loops or after checking that there's at
> least
> >    some minimal number of repeated keys and cache hits.  cost_nestloop
> would
> >    consider whether to use a result cache or not, and explain would show
> the
> >    cache stats as a part of nested loop.  In this case, I propose
> there'd still
> >    be a GUC to disable it.
>
> There was quite a bit of discussion on that topic already on this
> thread. I don't really want to revisit that.
>
> The main problem with that is that we'd be forced into costing a
> Nested loop with a result cache exactly the same as we do for a plain
> nested loop.  If we were to lower the cost to account for the cache
> hits then the planner is more likely to choose a nested loop over a
> merge/hash join.  If we then switched the caching off during execution
> due to low cache hits then that does not magically fix the bad choice
> of join method.  The planner may have gone with a Hash Join if it had
> known the cache hit ratio would be that bad. We'd still be left to
> deal with the poor performing nested loop.  What you'd really want
> instead of turning the cache off would be to have nested loop ditch
> the parameter scan and just morph itself into a Hash Join node. (I'm
> not proposing we do that)
>
> >  - Maybe cost_resultcache() can be split into initial_cost and final_cost
> >    parts, same as for nestloop ?  I'm not sure how it'd work, since
> >    initial_cost is supposed to return a lower bound, and resultcache
> tries to
> >    make things cheaper.  initial_cost would just add some operator/tuple
> costs
> >    to make sure that resultcache of a unique scan is more expensive than
> >    nestloop alone.  estimate_num_groups is at least O(n) WRT
> >    rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs)
> *
> >    cpu_operator_cost in initial_cost and then call estimate_num_groups in
> >    final_cost.  We'd be estimating the cost of estimating the cost...
>
> The cost of the Result Cache is pretty dependant on the n_distinct
> estimate. Low numbers of distinct values tend to estimate a high
> number of cache hits, whereas large n_distinct values (relative to the
> number of outer rows) is not going to estimate a large number of cache
> hits.
>
> I don't think feeding in a fake value would help us here.  We'd
> probably do better if we had a fast way to determine if a given Expr
> is unique. (e.g UniqueKeys patch).  Result Cache is never going to be
> a win for a parameter that the value is never the same as some
> previously seen value. This would likely allow us to skip considering
> a Result Cache for the majority of OLTP type joins.
>
> >  - Maybe an initial implementation of this would only add a result cache
> if the
> >    best plan was already going to use a nested loop, even though a cached
> >    nested loop might be cheaper than other plans.  This would avoid most
> >    planner costs, and give improved performance at execution time, but
> leaves
> >    something "on the table" for the future.
> >
> > > +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
> > > +                     Cost *rescan_startup_cost, Cost
> *rescan_total_cost)
> > > +{
> > > +     double          tuples = rcpath->subpath->rows;
> > > +     double          calls = rcpath->calls;
> > ...
> > > +     /* estimate on the distinct number of parameter values */
> > > +     ndistinct = estimate_num_groups(root, rcpath->param_exprs,
> calls, NULL,
> > > +                                     &estinfo);
> >
> > Shouldn't this pass "tuples" and not "calls" ?
>
> hmm. I don't think so. "calls" is the estimated number of outer side
> rows.  Here you're asking if the n_distinct estimate is relevant to
> the inner side rows. It's not. If we expect to be called 1000 times by
> the outer side of the nested loop, then we need to know our n_distinct
> estimate for those 1000 rows. If the estimate comes back as 10
> distinct values and we see that we're likely to be able to fit all the
> tuples for those 10 distinct values in the cache, then the hit ratio
> is going to come out at 99%. 10 misses, for the first time each value
> is looked up and the remainder of the 990 calls will be hits. The
> number of tuples (and the width of tuples) on the inside of the nested
> loop is only relevant to calculating how many cache entries is likely
> to fit into hash_mem.  When we think cache entries will be evicted
> then that makes the cache hit calculation more complex.
>
> I've tried to explain what's going on in cost_resultcache_rescan() the
> best I can with comments. I understand it's still pretty hard to
> follow what's going on. I'm open to making it easier to understand if
> you have suggestions.
>
> David
>


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to