On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote: > >Justin Pryzby <pry...@telsasoft.com> writes: > >> Jeff said: > >>> |What would I find very useful is a verbosity option to get the cost > >>> |estimates expressed as a multiplier of each *_cost parameter, rather than > >>> |just as a scalar. > > > >> It seems to me that's "just" a matter of redefining Cost and fixing > >> everything that breaks: > > > >> struct Cost { > >> double seq, rand; > >> double cpu_tuple, cpu_index_tuple, cpu_oper; > >> double parallel_setup; // This is probably always in startup_cost > >> and never in run_cost > >> double parallel_tuple; // This is probably always in run_cost and > >> never in startup_cost > >> double disable; > >> }; > > > >> I'm perhaps 50% done with that - is there some agreement that's a desirable > >> goal and a good way to do it ? > > > >No, I think this will get rejected out of hand. The implications for > >the planner's speed and memory consumption seem quite unacceptable > >for the size of the benefit. What you're showing above probably > >doubles the size of most Paths, and the added cycles in hot-spots > >like add_path seem pretty daunting. > > > > Yeah, that's an issue. But I have to admit my main issue with this > proposal is that I have no idea how I'd interpret this Cost. I mean, > what do the fields express for different types of paths? How do they > contribute to the actual cost of that path?
What I think users would be able to do with this info is understand which parameter to tweak to raise the estimated cost of the node. Everyone knows if you see a index scan is being used but is taking longer than a sequential scan then you might try raising random_page_cost. But I rarely see people tweaking the more "exotic" parameters like operator_tuple_cost or index_tuple_cost and when they do they aren't really sure what nodes they're affecting... I remember planning to do a very similar thing back in the 8.3 era and never getting around to it. You could imaging even storing these for the overall plan in the logs and building a large matrix of actual execution values versus these broken out individual costs. Then it becomes a standard linear optimization problem to find the optimal values for each parameter to minimize inaccurate plan estimates (and to identify cases where there are outliers). -- greg