On Mon, Feb 22, 2021 at 9:21 AM Justin Pryzby <pry...@telsasoft.com> wrote:
> On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote: > > To summarise here, the planner performance gets a fair bit worse with > > the patched code. With master, summing the average planning time over > > each of the queries resulted in a total planning time of 765.7 ms. > > After patching, that went up to 1097.5 ms. I was pretty disappointed > > about that. > > I have a couple ideas; > > - default enable_resultcache=off seems okay. In plenty of cases, planning > time is unimportant. This is the "low bar" - if we can do better and > enable > it by default, that's great. > > - Maybe this should be integrated into nestloop rather than being a > separate > plan node. That means that it could be dynamically enabled during > execution, maybe after a few loops or after checking that there's at > least > some minimal number of repeated keys and cache hits. cost_nestloop > would > consider whether to use a result cache or not, and explain would show > the > cache stats as a part of nested loop. +1 for this idea now.. I am always confused why there is no such node in Oracle even if it is so aggressive to do performance improvement and this function looks very promising. After realizing the costs in planner, I think planning time might be an answer (BTW, I am still not sure Oracle did this). In this case, I propose there'd still > be a GUC to disable it. > > - Maybe cost_resultcache() can be split into initial_cost and final_cost > parts, same as for nestloop ? I'm not sure how it'd work, since > initial_cost is supposed to return a lower bound, and resultcache tries > to > make things cheaper. initial_cost would just add some operator/tuple > costs > to make sure that resultcache of a unique scan is more expensive than > nestloop alone. estimate_num_groups is at least O(n) WRT > rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) * > cpu_operator_cost in initial_cost and then call estimate_num_groups in > final_cost. We'd be estimating the cost of estimating the cost... > > - Maybe an initial implementation of this would only add a result cache > if the > best plan was already going to use a nested loop, even though a cached > nested loop might be cheaper than other plans. This would avoid most > planner costs, and give improved performance at execution time, but > leaves > something "on the table" for the future. > > > +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath, > > + Cost *rescan_startup_cost, Cost *rescan_total_cost) > > +{ > > + double tuples = rcpath->subpath->rows; > > + double calls = rcpath->calls; > ... > > + /* estimate on the distinct number of parameter values */ > > + ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, > NULL, > > + &estinfo); > > Shouldn't this pass "tuples" and not "calls" ? > > -- > Justin > -- Best Regards Andy Fan (https://www.aliyun.com/)