Re: Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

Jing Ge Mon, 13 Jun 2022 22:52:27 -0700

Hi 华宗

退订请发送任意消息至dev-unsubscr...@flink.apache.org
In order to unsubscribe, please send an email to
dev-unsubscr...@flink.apache.org


Thanks

Best regards,
Jing


On Tue, Jun 14, 2022 at 2:05 AM 华宗 <zhanghuaz...@126.com> wrote:

> 退订
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2022-06-13 22:44:24, "cao zou" <zoucao...@gmail.com> wrote:
> >Hi godfrey, thanks for your detail explanation.
> >After explaining and glancing over the FLIP-231, I think it is
> >really need, +1 for this and looking forward to it.
> >
> >best
> >zoucao
> >
> >godfrey he <godfre...@gmail.com> 于2022年6月13日周一 14:43写道：
> >
> >> Hi Ingo,
> >>
> >> The semantics does not distinguish batch and streaming,
> >> It works for both batch and streaming, but the result of
> >> unbounded sources is meaningless.
> >> Currently, I throw exception for streaming mode,
> >> and we can support streaming mode with bounded source
> >> in the future.
> >>
> >> Best,
> >> Godfrey
> >>
> >> Ingo Bürk <airbla...@apache.org> 于2022年6月13日周一 14:17写道：
> >> >
> >> > Hi Godfrey,
> >> >
> >> > thank you for the explanation. A SELECT is definitely more generic and
> >> > will work for all connectors automatically. As such I think it's a
> good
> >> > baseline solution regardless.
> >> >
> >> > We can also think about allowing connector-specific optimizations in
> the
> >> > future, but I do like your idea of letting the optimizer rules
> perform a
> >> > lot of the work here already by leveraging existing optimizations.
> >> > Similarly things like non-null counts of non-nullable columns would
> (or
> >> > at least could) be handled by the optimizer rules already.
> >> >
> >> > So as far as that point goes, +1 to the generic approach.
> >> >
> >> > One more point, though: In general we should avoid supporting features
> >> > only in specific modes as it breaks the unification promise. Given
> that
> >> > ANALYZE is a manual and completely optional operation I'm OK with
> doing
> >> > that here in principle. However, I wonder what will happen in the
> >> > streaming / unbounded case. Do you plan to throw an error? Or do we
> >> > complete the command as successful but without doing anything?
> >> >
> >> >
> >> > Best
> >> > Ingo
> >> >
> >> > On 13.06.22 05:50, godfrey he wrote:
> >> > > Hi Ingo,
> >> > >
> >> > > Thanks for the inputs.
> >> > >
> >> > > I think converting `ANALYZE TABLE` to `SELECT` statement is
> >> > > more generic approach. Because query plan optimization is more
> generic,
> >> > >   we can provide more optimization rules to optimize not only
> `SELECT`
> >> statement
> >> > > converted from `ANALYZE TABLE` but also the `SELECT` statement
> written
> >> by users.
> >> > >
> >> > >> JDBC connector can get a row count estimate without performing a
> >> > >> SELECT COUNT(1)
> >> > > To optimize such cases, we can implement a rule to push aggregate
> into
> >> > > table source.
> >> > > Currently, there is a similar rule: SupportsAggregatePushDown, which
> >> > > supports only pushing
> >> > > local aggregate into source now.
> >> > >
> >> > >
> >> > > Best,
> >> > > Godfrey
> >> > >
> >> > > Ingo Bürk <airbla...@apache.org> 于2022年6月10日周五 17:15写道：
> >> > >>
> >> > >> Hi Godfrey,
> >> > >>
> >> > >> compared to the solution proposed in the FLIP (using a SELECT
> >> > >> statement), I wonder if you have considered adding APIs to
> catalogs /
> >> > >> connectors to perform this task as an alternative?
> >> > >> I could imagine that for many connectors, statistics could be
> >> > >> implemented in a less expensive way by leveraging the underlying
> >> system
> >> > >> (e.g. a JDBC connector can get a row count estimate without
> >> performing a
> >> > >> SELECT COUNT(1)).
> >> > >>
> >> > >>
> >> > >> Best
> >> > >> Ingo
> >> > >>
> >> > >>
> >> > >> On 10.06.22 09:53, godfrey he wrote:
> >> > >>> Hi all,
> >> > >>>
> >> > >>> I would like to open a discussion on FLIP-240:  Introduce "ANALYZE
> >> > >>> TABLE" Syntax.
> >> > >>>
> >> > >>> As FLIP-231 mentioned, statistics are one of the most important
> >> inputs
> >> > >>> to the optimizer. Accurate and complete statistics allows the
> >> > >>> optimizer to be more powerful. "ANALYZE TABLE" syntax is a very
> >> common
> >> > >>> but effective approach to gather statistics, which is already
> >> > >>> introduced by many compute engines and databases.
> >> > >>>
> >> > >>> The main purpose of  discussion is to introduce "ANALYZE TABLE"
> >> syntax
> >> > >>> for Flink sql.
> >> > >>>
> >> > >>> You can find more details in FLIP-240 document[1]. Looking
> forward to
> >> > >>> your feedback.
> >> > >>>
> >> > >>> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> >> > >>> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-240
> >> > >>>
> >> > >>>
> >> > >>> Best,
> >> > >>> Godfrey
> >>
>

Re: Re: [DISCUSS] FLIP-240: Introduce "ANALYZE TABLE" Syntax

Reply via email to