Hi 华宗 退订请发送任意消息至dev-unsubscr...@flink.apache.org In order to unsubscribe, please send an email to dev-unsubscr...@flink.apache.org
Thanks Best regards, Jing On Tue, Jun 14, 2022 at 2:05 AM 华宗 <zhanghuaz...@126.com> wrote: > 退订 > > > > > > > > > > > > > > > > > > At 2022-06-13 22:44:24, "cao zou" <zoucao...@gmail.com> wrote: > >Hi godfrey, thanks for your detail explanation. > >After explaining and glancing over the FLIP-231, I think it is > >really need, +1 for this and looking forward to it. > > > >best > >zoucao > > > >godfrey he <godfre...@gmail.com> 于2022年6月13日周一 14:43写道: > > > >> Hi Ingo, > >> > >> The semantics does not distinguish batch and streaming, > >> It works for both batch and streaming, but the result of > >> unbounded sources is meaningless. > >> Currently, I throw exception for streaming mode, > >> and we can support streaming mode with bounded source > >> in the future. > >> > >> Best, > >> Godfrey > >> > >> Ingo Bürk <airbla...@apache.org> 于2022年6月13日周一 14:17写道: > >> > > >> > Hi Godfrey, > >> > > >> > thank you for the explanation. A SELECT is definitely more generic and > >> > will work for all connectors automatically. As such I think it's a > good > >> > baseline solution regardless. > >> > > >> > We can also think about allowing connector-specific optimizations in > the > >> > future, but I do like your idea of letting the optimizer rules > perform a > >> > lot of the work here already by leveraging existing optimizations. > >> > Similarly things like non-null counts of non-nullable columns would > (or > >> > at least could) be handled by the optimizer rules already. > >> > > >> > So as far as that point goes, +1 to the generic approach. > >> > > >> > One more point, though: In general we should avoid supporting features > >> > only in specific modes as it breaks the unification promise. Given > that > >> > ANALYZE is a manual and completely optional operation I'm OK with > doing > >> > that here in principle. However, I wonder what will happen in the > >> > streaming / unbounded case. Do you plan to throw an error? Or do we > >> > complete the command as successful but without doing anything? > >> > > >> > > >> > Best > >> > Ingo > >> > > >> > On 13.06.22 05:50, godfrey he wrote: > >> > > Hi Ingo, > >> > > > >> > > Thanks for the inputs. > >> > > > >> > > I think converting `ANALYZE TABLE` to `SELECT` statement is > >> > > more generic approach. Because query plan optimization is more > generic, > >> > > we can provide more optimization rules to optimize not only > `SELECT` > >> statement > >> > > converted from `ANALYZE TABLE` but also the `SELECT` statement > written > >> by users. > >> > > > >> > >> JDBC connector can get a row count estimate without performing a > >> > >> SELECT COUNT(1) > >> > > To optimize such cases, we can implement a rule to push aggregate > into > >> > > table source. > >> > > Currently, there is a similar rule: SupportsAggregatePushDown, which > >> > > supports only pushing > >> > > local aggregate into source now. > >> > > > >> > > > >> > > Best, > >> > > Godfrey > >> > > > >> > > Ingo Bürk <airbla...@apache.org> 于2022年6月10日周五 17:15写道: > >> > >> > >> > >> Hi Godfrey, > >> > >> > >> > >> compared to the solution proposed in the FLIP (using a SELECT > >> > >> statement), I wonder if you have considered adding APIs to > catalogs / > >> > >> connectors to perform this task as an alternative? > >> > >> I could imagine that for many connectors, statistics could be > >> > >> implemented in a less expensive way by leveraging the underlying > >> system > >> > >> (e.g. a JDBC connector can get a row count estimate without > >> performing a > >> > >> SELECT COUNT(1)). > >> > >> > >> > >> > >> > >> Best > >> > >> Ingo > >> > >> > >> > >> > >> > >> On 10.06.22 09:53, godfrey he wrote: > >> > >>> Hi all, > >> > >>> > >> > >>> I would like to open a discussion on FLIP-240: Introduce "ANALYZE > >> > >>> TABLE" Syntax. > >> > >>> > >> > >>> As FLIP-231 mentioned, statistics are one of the most important > >> inputs > >> > >>> to the optimizer. Accurate and complete statistics allows the > >> > >>> optimizer to be more powerful. "ANALYZE TABLE" syntax is a very > >> common > >> > >>> but effective approach to gather statistics, which is already > >> > >>> introduced by many compute engines and databases. > >> > >>> > >> > >>> The main purpose of discussion is to introduce "ANALYZE TABLE" > >> syntax > >> > >>> for Flink sql. > >> > >>> > >> > >>> You can find more details in FLIP-240 document[1]. Looking > forward to > >> > >>> your feedback. > >> > >>> > >> > >>> [1] > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481 > >> > >>> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240 > >> > >>> > >> > >>> > >> > >>> Best, > >> > >>> Godfrey > >> >