Hi David, Thanks for taking a look at this! On Fri, Sep 15, 2023 at 3:15 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Thu, 7 Sept 2023 at 04:37, Andy Fan <zhihui.fan1...@gmail.com> wrote: > > Currently add_paths_to_append_rel overlooked the startup cost for > creating > > append path, so it may have lost some optimization chances. After a > glance, > > the following 4 identifiers can be impacted. > > > - We shouldn't do the optimization if there are still more tables to > join, > > the reason is similar to has_multiple_baserels(root) in > > set_subquery_pathlist. But the trouble here is we may inherit multiple > > levels to build an appendrel, so I have to keep the 'top_relids' all > the > > time and compare it with PlannerInfo.all_baserels. If they are the > same, > > then it is the case we want to optimize. > > I think you've likely gone to the trouble of trying to determine if > there are any joins pending because you're considering using a cheap > startup path *instead* of the cheapest total path and you don't want > to do that when some join will cause all the rows to be read thus > making the plan more expensive if a cheap startup path was picked. > Yes, that's true. However it is not something we can't resolve, one of the solutions is just like what I did in the patch. but currently the main stuff which confuses me is if it is the right thing to give up the optimization if it has more tables to join (just like set_subquery_pathlist did). > Instead of doing that, why don't you just create a completely new > AppendPath containing all the cheapest_startup_paths and add that to > the append rel. You can optimise this and only do it when > rel->consider_startup is true. > > Does the attached do anything less than what your patch does? > We can work like this, but there is one difference from what my current patch does. It is cheapest_startup_path vs cheapest fraction path. For example if we have the following 3 paths with all of the estimated rows is 100 and the tuple_fraction is 10. Path 1: startup_cost = 60, total_cost = 80 -- cheapest total cost. Path 2: startup_cost = 10, total_cost = 1000 -- cheapest startup cost Path 3: startup_cost = 20, total_cost = 90 -- cheapest fractional cost So with the patch you propose, Path 1 & Path 3 are chosen to build append path. but with my current patch, Only path 3 is kept. IIUC, path 3 should be the best one in this case. We might also argue why Path 3 is kept in the first place (the children level), I think pathkey might be one option. and even path 3 is discarded somehow, I think only if it is the best one, we should keep it ideally. Another tiny factor of this is your propose isn't consistent with what set_subquery_pathlist which uses cheapest fractional cost and my proposal isn't consistent plain rel which uses cheapest startup cost. We can't say which one is better, though. If my above analysis is correct, I think the best way to handle this is if there is no more tables to join, we use cheapest fraction cost for all the kinds of relations, including plain relation, append rel, subquery and so on. If we have more tables to join, we use cheapest startup cost. On the implementation side, I want to use RelOptInfo.tuple_fraction instead of RelOptInfo.consider_startup. tuple_fraction = -1 means startup cost should not be considered. tuple_fraction = 0 means cheapest startup cost should be used. tuple_franction > 0 means cheapest fraction cost should be used. I still don't pay enough attention to consider_param_startup in RelOptInfo, I'm feeling the above strategy will not generate too much overhead to the planner for now while it can provides a better plan sometimes. -- Best Regards Andy Fan