Hi Yuhong, Thanks for driving the feature.
I just have one question. Is the bushy join reorder optimization enabled by default? Does the bushy join reorder will replace the existing Lopt join reorder rule? Besides, I guess the option "table.oprimizer.busy-join-reorder-threshold” should be "table.optimizer.bushy-join-reorder-threshold”? (I guess they are just typos, as your last email said, but I just want to clarify as it is a public API). Best, Jark > 2023年1月3日 12:53,Benchao Li <libenc...@apache.org> 写道: > > Hi Yunhong, > > Thanks for driving this~ > > I haven't gone deep into the implementation details yet. Regarding the > general description, I would ask a few questions firstly: > > #1, Is there any benchmark results about the optimization latency change > compared to current approach? In OLAP scenario, query optimization latency > is more crucial. > > #2, About the term "busy join reorder", is there any others systems which > also use this term? I know Calcite has a rule[1] which uses the term "bushy > join". > > #3, About the implementation, if this does the same work as Calcite > MultiJoinOptimizeBushyRule, is it possible to use the Calcite version > directly or extend it in some way? > > [1] > https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78 > > yh z <zhengyunhon...@gmail.com> 于2022年12月29日周四 14:44写道: > >> Hi, devs, >> >> I'd like to start a discuss about adding an option called >> "table.oprimizer.busy-join-reorder-threshold" for planner rule while we try >> to introduce a new busy join reorder rule[1] into Flink. >> >> This join reorder rule is based on dynamic programing[2], which can store >> all possible intermediate results, and the cost model can be used to select >> the optimal join reorder result. Compare with the existing Lopt join >> reorder rule, the new rule can give more possible results and the result >> can be more accurate. However, the search space of this rule will become >> very large as the number of tables increases. So we should introduce an >> option to limit the expansion of search space, if the number of table can >> be reordered less than the threshold, the new busy join reorder rule is >> used. On the contrary, the Lopt rule is used. >> >> The default threshold intended to be set to 12. One reason is that in the >> tpc-ds benchmark test, when the number of tables exceeds 12, the >> optimization time will be very long. The other reason is that it refers to >> relevant engines, like Spark, whose recommended setting is 12.[3] >> >> Looking forward to your feedback. >> >> [1] https://issues.apache.org/jira/browse/FLINK-30376 >> [2] >> >> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf >> [3] >> >> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration >> >> Best regards, >> Yunhong Zheng >> > > > -- > > Best, > Benchao Li