On Mon, 18 Sept 2023 at 22:55, Andy Fan <zhihui.fan1...@gmail.com> wrote: > Here is an updated version to show what's in my mind.
My current thoughts on this are that the fractional cost part adds quite a bit more complexity than the minimalistic approach of just also considering the cheapest startup path. There's also quite a bit I don't like about the extra code you've added: 1. RelOptInfo.tuple_fraction is not given a default value in locations where we do makeNode(RelOptInfo); 2. This is very poorly documented and badly named. Also seems to have a typo "stopper" + /* Like order by, group by, distinct and so. */ + bool has_stoper_op; With that, nobody has a hope of knowing if some new operation should set that value to true or false. I think it needs to define the meaning, which I think (very roughly) is "does the query require any additional upper-planner operations which could require having to read more tuples from the final join relation than the number of tuples which are read from the final upper rel." 3. get_fractional_path_cost() goes to the trouble of setting total_rows then does not use it. 4. I don't see why it's ok to take the total_rows from the first Path in the list in get_cheapest_fractional_path_ext(). What if another Path has some other value? But overall, I'm more inclined to just go with the more simple "add a cheap unordered startup append path if considering cheap startup plans" version. I see your latest patch does both. So, I'd suggest two patches as I do see the merit in keeping this simple and cheap. If we can get the first part in and you still find cases where you're not getting the most appropriate startup plan based on the tuple fraction, then we can reconsider what extra complexity we should endure in the code based on the example query where we've demonstrated the planner is not choosing the best startup path appropriate to the given tuple fraction. David