Re: [DISCUSS] Improvements to flink's cost based optimizer

2017-01-20 Thread Fabian Hueske
Hi Kurt, thanks for breaking down the overall into smaller tasks and creating the corresponding JIRA issues. Using default estimates for unknown tables can be quite risky, especially for statistics like cardinality. In this cases collecting basic stats while writing the input (i.e., a arbitrary D

Re: [DISCUSS] Improvements to flink's cost based optimizer

2017-01-18 Thread Kurt Young
Hi Fabian, Thanks for your detailed response and sorry for the late response. Your opinions all make sense to me, and here is some thoughts to your open questions: - Regarding to table without sufficient statistics, especially these kind of "dynamic" table which derived from some arbitrary DataSe

Re: [DISCUSS] Improvements to flink's cost based optimizer

2017-01-10 Thread Fabian Hueske
Hi Kurt, thanks for starting this discussion! Although, we use Calcite's cost based optimizer we do not use its full potential. As you correctly identified, this is mainly due to the lack of reliable statistics. Moreover, we use Calcite only for logical optimization, i.e., the optimizer basically

[DISCUSS] Improvements to flink's cost based optimizer

2017-01-10 Thread Kurt Young
Hi, Currently flink already uses cost-based optimizer, but due to the reason we didn’t have accurate statistics and the simple cost model, we actually don't gain much from this framework. I proposed some improvements in the following document and some rough implementation plan: https://docs.googl