2019年8月1日(木) 19:24 Tomas Vondra <tomas.von...@2ndquadrant.com>: > > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote: > >2019年8月1日(木) 16:19 Richard Guo <ri...@pivotal.io>: > >> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kai...@heterodb.com> wrote: > >>> > >>> 2019年8月1日(木) 1:41 Tom Lane <t...@sss.pgh.pa.us>: > >>> > > >>> > Robert Haas <robertmh...@gmail.com> writes: > >>> > > Yeah, but I have to admit that this whole design makes me kinda > >>> > > uncomfortable. Every time somebody comes up with a new figure of > >>> > > merit, it increases not only the number of paths retained but also the > >>> > > cost of comparing two paths to possibly reject one of them. A few > >>> > > years ago, you came up with the (good) idea of rejecting some join > >>> > > paths before actually creating the paths, and I wonder if we ought to > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has > >>> > > been saying, we ought to think about planning top-down with > >>> > > memoization instead of bottom up (yeah, I know that's a huge change). > >>> > > It just feels like the whole idea of a list of paths ordered by cost > >>> > > breaks down when there are so many ways that a not-cheapest path can > >>> > > still be worth keeping. Not sure exactly what would be better, though. > >>> > > >>> > Yeah, I agree that add_path is starting to feel creaky. I don't > >>> > know what to do instead though. Changing to a top-down design > >>> > sounds like it would solve some problems while introducing others > >>> > (not to mention the amount of work and breakage involved). > >>> > > >>> Hmm... It looks the problem we ought to revise about path construction > >>> is much larger than my expectation, and uncertain for me how much works > >>> are needed. > >>> > >>> Although it might be a workaround until fundamental reconstruction, > >>> how about to have a margin of estimated cost to reject paths? > >>> Current add_path() immediately rejects lesser paths if its cost is > >>> even a little more expensive than the compared one. One the other hands, > >> > >> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on > >> costs of two paths, although the fuzz factor (1%) is hard coded and not > >> user-controllable. > >> > >Ah, sorry, I oversight this logic... > > > > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable > solution, because how would you know what value is the right one? Why ould > 10% be the right threshold, for example? In my experience these these > hard-coded coefficients imply behavior that's difficult to predict and > explain to users. > Ah... That's exactly hard task to explain to users.
> >>> I understand it is not an essential re-design of path-construction logic, > >>> and > >>> may have limitation. However, amount of works are reasonable and no side- > >>> effect. (current behavior = 0% threshold). > >>> How about your opinions? > >>> > >> > >> How's about Tom's suggestion on adding another dimension in add_path() > >> to be considered, just like how it considers paths of better sort order > >> or parallel-safe? > >> > >Robert also mentioned it makes comparison operation more complicated. > >If we try to have another dimension here, a callback function in Path node > >may be able to tell the core optimizer whether "dominated path" shall be > >dropped or not, without further complexity. It is just an idea. > > > > I think adding a hook to add_path() allowing to override the decidion > should be OK. The chance of getting that committed in the near future > seems much higher than for a patch that completely reworks add_path(). > > There's one caveat, though - AFAICS various places in the planner use > things like cheapest_total_path, cheapest_startup_path and even > get_cheapest_path_for_pathkeys() which kinda assumes add_path() only > considers startup/total cost. It might happen that even after keeping > additional paths, the planner still won't use them :-( > Even if existing code looks at only cheapest_xxx_path, I don't think it is a problematic behavior because these paths are exactly cheapest at a level, but they may use more expensive paths in the deeper level. If a hook can prevent dropping a path, not cheapest, in a particular level, this path shall not appear on the cheapest_xxx_path, however, upper level path construction logic can pick up these paths as a candidate of input. If it has special discount factor here, the startup/total cost of the upper level path may have cheaper cost in spite of expensive input cost. In this scenario, this hook gives a decision whether dominated path-node shall be dropped or not. So, core optimizer still compares path-node using estimated cost value. In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path construction, GpuPreAgg + GpuJoin may be cheaper than others because of data exchange on GPU device memory. As long as GpuJoin is not removed from the pathlist, extension can build its custom-path with cheaper cost. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kai...@heterodb.com>