Hi Benchao,

Thanks for your reply.

Actually,  I mistakenly wrote the name "bushy join reorder" to "busy join
reorder". I'm sorry for the trouble bring to you. "Bushy join reorder"
means we can build a bushy join tree based on cost model, but now Flink can
only build a left-deep tree using Calcite LoptOptimizeJoinRule. I hope my
answers can help you solve the following questions:

For question #1: The biggest advantage of this "bushy join reorder"
strategy over the default Flink left-deep tree strategy is that it can
retail all possible join reorder plans, and then select the optimal plan
according to the cost model. This means that the busy join reorder strategy
can be better combined with the current cost model to get more reasonable
join reorder results. We verified it on the TPC-DS benchmark, with the
spark plan as a reference, the new busy join reorder strategy can make more
TPC-DS query plans be adjusted to be consistent with the Spark plan, and
the execution time is signifcantly reduced.  As for optimization
latency, this is the problem to be solved by the parameters to be
introduced in this discussion. When there are many tables need to be
reordered, the optimization latency will increase greatly. But when the
table numbers less than the threshold, the latency is the same as the
LoptOptimizeJoinRule.

For question #2: According to my research, many compute or database systems
have the "bushy join reorder" strategies based on dynamic programming. For
example, Spark and PostgresSql use the same strategy, and the threshold be
set to 12. Also, some papers, like [1] and [2], have also researched this
strategy, and [2] set the threshold to 14.

For question #3: The implementation of Calcite MultiJoinOptimizeBushyRule
is very simple, and it will not store the intermediate results at all. So,
the implementation of Calcite cannot get all possible join reorder results
and it cannot combine with the current cost model to get more reasonable
join reorder results.


[1]
https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
[2] https://db.in.tum.de/~radke/papers/hugejoins.pdf



Benchao Li <libenc...@apache.org> 于2023年1月3日周二 12:54写道:

> Hi Yunhong,
>
> Thanks for driving this~
>
> I haven't gone deep into the implementation details yet. Regarding the
> general description, I would ask a few questions firstly:
>
> #1, Is there any benchmark results about the optimization latency change
> compared to current approach? In OLAP scenario, query optimization latency
> is more crucial.
>
> #2, About the term "busy join reorder", is there any others systems which
> also use this term? I know Calcite has a rule[1] which uses the term "bushy
> join".
>
> #3, About the implementation, if this does the same work as Calcite
> MultiJoinOptimizeBushyRule, is it possible to use the Calcite version
> directly or extend it in some way?
>
> [1]
>
> https://github.com/apache/calcite/blob/9054682145727fbf8a13e3c79b3512be41574349/core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java#L78
>
> yh z <zhengyunhon...@gmail.com> 于2022年12月29日周四 14:44写道:
>
> > Hi, devs,
> >
> > I'd like to start a discuss about adding an option called
> > "table.oprimizer.busy-join-reorder-threshold" for planner rule while we
> try
> > to introduce a new busy join reorder rule[1] into Flink.
> >
> > This join reorder rule is based on dynamic programing[2], which can store
> > all possible intermediate results, and the cost model can be used to
> select
> > the optimal join reorder result. Compare with the existing Lopt join
> > reorder rule, the new rule can give more possible results and the result
> > can be more accurate. However, the search space of this rule will become
> > very large as the number of tables increases. So we should introduce an
> > option to limit the expansion of search space, if the number of table can
> > be reordered less than the threshold, the new busy join reorder rule is
> > used. On the contrary, the Lopt rule is used.
> >
> > The default threshold intended to be set to 12. One reason is that in the
> > tpc-ds benchmark test, when the number of tables exceeds 12, the
> > optimization time will be very long. The other reason is that it refers
> to
> > relevant engines, like Spark, whose recommended setting is 12.[3]
> >
> > Looking forward to your feedback.
> >
> > [1]  https://issues.apache.org/jira/browse/FLINK-30376
> > [2]
> >
> >
> https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
> > [3]
> >
> >
> https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration
> >
> > Best regards,
> > Yunhong Zheng
> >
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to