Hi Yuhong,

Thanks for driving the feature. 

I just have one question. Is the bushy join reorder optimization enabled by 
default? Does the bushy join reorder will replace the existing Lopt join
reorder rule? 

Besides, I guess the option "table.oprimizer.busy-join-reorder-threshold” 
should be "table.optimizer.bushy-join-reorder-threshold”?  (I guess they are 
just typos, as your last email said, but I just want to clarify as it is a 
public API).

Best,
Jark


> 2023年1月3日 12:53,Benchao Li <libenc...@apache.org> 写道:
> 
> Hi Yunhong,
> 
> Thanks for driving this~
> 
> I haven't gone deep into the implementation details yet. Regarding the
> general description, I would ask a few questions firstly:
> 
> #1, Is there any benchmark results about the optimization latency change
> compared to current approach? In OLAP scenario, query optimization latency
> is more crucial.
> 
> #2, About the term "busy join reorder", is there any others systems which
> also use this term? I know Calcite has a rule[1] which uses the term "bushy
> join".
> 
> #3, About the implementation, if this does the same work as Calcite
> MultiJoinOptimizeBushyRule, is it possible to use the Calcite version
> directly or extend it in some way?
> 
> [1]
> https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78
> 
> yh z <zhengyunhon...@gmail.com> 于2022年12月29日周四 14:44写道:
> 
>> Hi, devs,
>> 
>> I'd like to start a discuss about adding an option called
>> "table.oprimizer.busy-join-reorder-threshold" for planner rule while we try
>> to introduce a new busy join reorder rule[1] into Flink.
>> 
>> This join reorder rule is based on dynamic programing[2], which can store
>> all possible intermediate results, and the cost model can be used to select
>> the optimal join reorder result. Compare with the existing Lopt join
>> reorder rule, the new rule can give more possible results and the result
>> can be more accurate. However, the search space of this rule will become
>> very large as the number of tables increases. So we should introduce an
>> option to limit the expansion of search space, if the number of table can
>> be reordered less than the threshold, the new busy join reorder rule is
>> used. On the contrary, the Lopt rule is used.
>> 
>> The default threshold intended to be set to 12. One reason is that in the
>> tpc-ds benchmark test, when the number of tables exceeds 12, the
>> optimization time will be very long. The other reason is that it refers to
>> relevant engines, like Spark, whose recommended setting is 12.[3]
>> 
>> Looking forward to your feedback.
>> 
>> [1]  https://issues.apache.org/jira/browse/FLINK-30376
>> [2]
>> 
>> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
>> [3]
>> 
>> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration
>> 
>> Best regards,
>> Yunhong Zheng
>> 
> 
> 
> -- 
> 
> Best,
> Benchao Li

Reply via email to