Hi, Currently the biggest limitation that prevents better query optimisation is lack of table statistics (which are not trivial to provide in streaming), thus Joins/Aggregation reordering doesn’t work. We have some ideas how to tackle this issue and definitely at some point of time we will improve this.
Piotrek > On 14 Jul 2018, at 06:48, Xingcan Cui <xingc...@gmail.com> wrote: > > Hi Albert, > > Calcite provides a rule-based optimizer (as a framework), which means users > can customize it by adding rules. That’s exactly what Flink did. From the > logical plan to the physical plan, the translations are triggered by > different sets of rules, according to which the relational expressions are > replaced, reordered or optimized. > > However, IMO, the current optimization rules in Flink Table API are quite > primal. Some SQL statements (e.g., multiple joins) are just translated to > feasible execution plans, instead of optimized ones, since it’s much more > difficult to conduct query optimization on large datasets or dynamic streams. > You could first start from the Calcite query optimizer, and then try to make > your own rules. > > Best, > Xingcan > >> On Jul 14, 2018, at 11:55 AM, vino yang <yanghua1...@gmail.com> wrote: >> >> Hi Albert, >> >> First I guess the query optimizer you mentioned is about Flink table & sql >> (for batch API there is another optimizer which is implemented by Flink). >> >> Yes, now for table & sql, Flink use Apache Calcite's query optimizer to >> translate into a Calcite plan >> which is then optimized according to Calcite's optimization rules. >> >> The following rules are applied so far: >> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/FlinkRuleSets.scala >> >> In view of Flink depends on the Calcite to do the optimization, I think >> enhance Flink and Calcite would be the right direction. >> >> Hope for you provide more idea and details. Flink community welcome your >> idea and contribution. >> >> Thanks. >> Vino. >> >> >> 2018-07-13 23:39 GMT+08:00 Albert Jonathan <alb...@cs.umn.edu>: >> >>> Hello, >>> >>> I am just wondering, does Flink use Apache Calcite's query optimizer to >>> generate an optimal logical plan, or does it have its own query optimizer? >>> From what I observed so far, the Flink's query optimizer only groups >>> operator together without changing the order of aggregation operators >>> (e.g., join). Did I miss anything? >>> >>> I am thinking of extending Flink to apply query optimization as in the >>> RDBMS by either integrating it with Calcite or implementing it as a new >>> module. >>> Any feedback or guidelines will be highly appreciated. >>> >>> Thank you, >>> Albert >>> >