Hi cao, Thanks for the feedback. AFAK, unlike databases' behavior, the statistics will not collected automatically when writing data for many big data compute engines. FLIP-231[1] has introduced SupportsStatisticsReport interface which the planner will collect the statistics from connector when statistics from catalog is unknown. But the statistics from connector usually has partial information. Typically, the number of distinct values will not included. `ANALYZE TABLE` provides a way of updating complete statistical information manually. This is also provided by many big data compute engines and databases.
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-231%3A+Introduce+SupportsStatisticReport+to+support+reporting+statistics+from+source+connectors Best, Godfrey cao zou <zoucao...@gmail.com> 于2022年6月10日周五 16:49写道: > > Hi godfrey, Thanks for driving this meaningful topic. > I think statistics are essential and meaningful for the optimizer, I'm just > wondering which situation is needed. From the user side, the optimizer > should be executed by the framework, maybe they do not want to consider too > much about it. Could you share more situations about using 'ANALYZE TABLE' > from the user side? > > nit: There maybe exists a mistake in Examples#partition table > the partition info should be > > Partition1: (ds='2022-06-01', hr=1) > > Partition2: (ds='2022-06-01', hr=2) > > Partition3: (ds='2022-06-02', hr=1) > > Partition4: (ds='2022-06-02', hr=2) > > best > zoucao > > > godfrey he <godfre...@gmail.com> 于2022年6月10日周五 15:54写道: > > > Hi all, > > > > I would like to open a discussion on FLIP-240: Introduce "ANALYZE > > TABLE" Syntax. > > > > As FLIP-231 mentioned, statistics are one of the most important inputs > > to the optimizer. Accurate and complete statistics allows the > > optimizer to be more powerful. "ANALYZE TABLE" syntax is a very common > > but effective approach to gather statistics, which is already > > introduced by many compute engines and databases. > > > > The main purpose of discussion is to introduce "ANALYZE TABLE" syntax > > for Flink sql. > > > > You can find more details in FLIP-240 document[1]. Looking forward to > > your feedback. > > > > [1] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481 > > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240 > > > > > > Best, > > Godfrey > >