On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
2019年8月1日(木) 16:19 Richard Guo <ri...@pivotal.io>:

On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kai...@heterodb.com> wrote:

2019年8月1日(木) 1:41 Tom Lane <t...@sss.pgh.pa.us>:
>
> Robert Haas <robertmh...@gmail.com> writes:
> > Yeah, but I have to admit that this whole design makes me kinda
> > uncomfortable.  Every time somebody comes up with a new figure of
> > merit, it increases not only the number of paths retained but also the
> > cost of comparing two paths to possibly reject one of them. A few
> > years ago, you came up with the (good) idea of rejecting some join
> > paths before actually creating the paths, and I wonder if we ought to
> > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > been saying, we ought to think about planning top-down with
> > memoization instead of bottom up (yeah, I know that's a huge change).
> > It just feels like the whole idea of a list of paths ordered by cost
> > breaks down when there are so many ways that a not-cheapest path can
> > still be worth keeping. Not sure exactly what would be better, though.
>
> Yeah, I agree that add_path is starting to feel creaky.  I don't
> know what to do instead though.  Changing to a top-down design
> sounds like it would solve some problems while introducing others
> (not to mention the amount of work and breakage involved).
>
Hmm... It looks the problem we ought to revise about path construction
is much larger than my expectation, and uncertain for me how much works
are needed.

Although it might be a workaround until fundamental reconstruction,
how about to have a margin of estimated cost to reject paths?
Current add_path() immediately rejects lesser paths if its cost is
even a little more expensive than the compared one. One the other hands,


Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
costs of two paths, although the fuzz factor (1%) is hard coded and not
user-controllable.

Ah, sorry, I oversight this logic...


FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
solution, because how would you know what value is the right one? Why ould
10% be the right threshold, for example? In my experience these these
hard-coded coefficients imply behavior that's difficult to predict and
explain to users.

I understand it is not an essential re-design of path-construction logic, and
may have limitation. However, amount of works are reasonable and no side-
effect. (current behavior = 0% threshold).
How about your opinions?


How's about Tom's suggestion on adding another dimension in add_path()
to be considered, just like how it considers paths of better sort order
or parallel-safe?

Robert also mentioned it makes comparison operation more complicated.
If we try to have another dimension here, a callback function in Path node
may be able to tell the core optimizer whether "dominated path" shall be
dropped or not, without further complexity. It is just an idea.


I think adding a hook to add_path() allowing to override the decidion
should be OK. The chance of getting that committed in the near future
seems much higher than for a patch that completely reworks add_path().

There's one caveat, though - AFAICS various places in the planner use
things like cheapest_total_path, cheapest_startup_path and even
get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
considers startup/total cost. It might happen that even after keeping
additional paths, the planner still won't use them :-(

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to