Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

Ingo Bürk Sun, 12 Jun 2022 23:17:05 -0700

Hi Godfrey,

thank you for the explanation. A SELECT is definitely more generic andwill work for all connectors automatically. As such I think it's a goodbaseline solution regardless.

We can also think about allowing connector-specific optimizations in thefuture, but I do like your idea of letting the optimizer rules perform alot of the work here already by leveraging existing optimizations.Similarly things like non-null counts of non-nullable columns would (orat least could) be handled by the optimizer rules already.


So as far as that point goes, +1 to the generic approach.

One more point, though: In general we should avoid supporting featuresonly in specific modes as it breaks the unification promise. Given thatANALYZE is a manual and completely optional operation I'm OK with doingthat here in principle. However, I wonder what will happen in thestreaming / unbounded case. Do you plan to throw an error? Or do wecomplete the command as successful but without doing anything?



Best
Ingo

On 13.06.22 05:50, godfrey he wrote:

Hi Ingo,

Thanks for the inputs.

I think converting `ANALYZE TABLE` to `SELECT` statement is
more generic approach. Because query plan optimization is more generic,
  we can provide more optimization rules to optimize not only `SELECT` statement
converted from `ANALYZE TABLE` but also the `SELECT` statement written by users.

JDBC connector can get a row count estimate without performing a
SELECT COUNT(1)

To optimize such cases, we can implement a rule to push aggregate into
table source.
Currently, there is a similar rule: SupportsAggregatePushDown, which
supports only pushing
local aggregate into source now.


Best,
Godfrey

Ingo Bürk <[email protected]> 于2022年6月10日周五 17:15写道：


Hi Godfrey,

compared to the solution proposed in the FLIP (using a SELECT
statement), I wonder if you have considered adding APIs to catalogs /
connectors to perform this task as an alternative?
I could imagine that for many connectors, statistics could be
implemented in a less expensive way by leveraging the underlying system
(e.g. a JDBC connector can get a row count estimate without performing a
SELECT COUNT(1)).


Best
Ingo


On 10.06.22 09:53, godfrey he wrote:

Hi all,

I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
TABLE" Syntax.

As FLIP-231 mentioned, statistics are one of the most important inputs
to the optimizer. Accurate and complete statistics allows the
optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common
but effective approach to gather statistics, which is already
introduced by many compute engines and databases.

The main purpose of  discussion is to introduce "ANALYZE TABLE" syntax
for Flink sql.

You can find more details in FLIP-240 document[1]. Looking forward to
your feedback.

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
[2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240


Best,
Godfrey

Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

Reply via email to