On Fri, Mar 12, 2021 at 8:31 AM David Rowley <dgrowle...@gmail.com> wrote:
> Thanks for these suggestions. > > On Mon, 22 Feb 2021 at 14:21, Justin Pryzby <pry...@telsasoft.com> wrote: > > > > On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > > > To summarise here, the planner performance gets a fair bit worse with > > > the patched code. With master, summing the average planning time over > > > each of the queries resulted in a total planning time of 765.7 ms. > > > After patching, that went up to 1097.5 ms. I was pretty disappointed > > > about that. > > > > I have a couple ideas; > I just checked the latest code, looks like we didn't improve this situation except that we introduced a GUC to control it. Am I missing something? I don't have a suggestion though. > > - default enable_resultcache=off seems okay. In plenty of cases, > planning > > time is unimportant. This is the "low bar" - if we can do better and > enable > > it by default, that's great. > > I think that's reasonable. Teaching the planner to do new tricks is > never going to make the planner produce plans more quickly. When the > new planner trick gives us a more optimal plan, then great. When it > does not then it's wasted effort. Giving users the ability to switch > off the planner's new ability seems like a good way for people who > continually find it the additional effort costs more than it saves > seems like a good way to keep them happy. > > > - Maybe this should be integrated into nestloop rather than being a > separate > > plan node. That means that it could be dynamically enabled during > > execution, maybe after a few loops or after checking that there's at > least > > some minimal number of repeated keys and cache hits. cost_nestloop > would > > consider whether to use a result cache or not, and explain would show > the > > cache stats as a part of nested loop. In this case, I propose > there'd still > > be a GUC to disable it. > > There was quite a bit of discussion on that topic already on this > thread. I don't really want to revisit that. > > The main problem with that is that we'd be forced into costing a > Nested loop with a result cache exactly the same as we do for a plain > nested loop. If we were to lower the cost to account for the cache > hits then the planner is more likely to choose a nested loop over a > merge/hash join. If we then switched the caching off during execution > due to low cache hits then that does not magically fix the bad choice > of join method. The planner may have gone with a Hash Join if it had > known the cache hit ratio would be that bad. We'd still be left to > deal with the poor performing nested loop. What you'd really want > instead of turning the cache off would be to have nested loop ditch > the parameter scan and just morph itself into a Hash Join node. (I'm > not proposing we do that) > > > - Maybe cost_resultcache() can be split into initial_cost and final_cost > > parts, same as for nestloop ? I'm not sure how it'd work, since > > initial_cost is supposed to return a lower bound, and resultcache > tries to > > make things cheaper. initial_cost would just add some operator/tuple > costs > > to make sure that resultcache of a unique scan is more expensive than > > nestloop alone. estimate_num_groups is at least O(n) WRT > > rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) > * > > cpu_operator_cost in initial_cost and then call estimate_num_groups in > > final_cost. We'd be estimating the cost of estimating the cost... > > The cost of the Result Cache is pretty dependant on the n_distinct > estimate. Low numbers of distinct values tend to estimate a high > number of cache hits, whereas large n_distinct values (relative to the > number of outer rows) is not going to estimate a large number of cache > hits. > > I don't think feeding in a fake value would help us here. We'd > probably do better if we had a fast way to determine if a given Expr > is unique. (e.g UniqueKeys patch). Result Cache is never going to be > a win for a parameter that the value is never the same as some > previously seen value. This would likely allow us to skip considering > a Result Cache for the majority of OLTP type joins. > > > - Maybe an initial implementation of this would only add a result cache > if the > > best plan was already going to use a nested loop, even though a cached > > nested loop might be cheaper than other plans. This would avoid most > > planner costs, and give improved performance at execution time, but > leaves > > something "on the table" for the future. > > > > > +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath, > > > + Cost *rescan_startup_cost, Cost > *rescan_total_cost) > > > +{ > > > + double tuples = rcpath->subpath->rows; > > > + double calls = rcpath->calls; > > ... > > > + /* estimate on the distinct number of parameter values */ > > > + ndistinct = estimate_num_groups(root, rcpath->param_exprs, > calls, NULL, > > > + &estinfo); > > > > Shouldn't this pass "tuples" and not "calls" ? > > hmm. I don't think so. "calls" is the estimated number of outer side > rows. Here you're asking if the n_distinct estimate is relevant to > the inner side rows. It's not. If we expect to be called 1000 times by > the outer side of the nested loop, then we need to know our n_distinct > estimate for those 1000 rows. If the estimate comes back as 10 > distinct values and we see that we're likely to be able to fit all the > tuples for those 10 distinct values in the cache, then the hit ratio > is going to come out at 99%. 10 misses, for the first time each value > is looked up and the remainder of the 990 calls will be hits. The > number of tuples (and the width of tuples) on the inside of the nested > loop is only relevant to calculating how many cache entries is likely > to fit into hash_mem. When we think cache entries will be evicted > then that makes the cache hit calculation more complex. > > I've tried to explain what's going on in cost_resultcache_rescan() the > best I can with comments. I understand it's still pretty hard to > follow what's going on. I'm open to making it easier to understand if > you have suggestions. > > David > -- Best Regards Andy Fan (https://www.aliyun.com/)