Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Becket Qin Tue, 09 Apr 2024 18:59:30 -0700

Thanks for the proposal. I like the FLIP.

My ranking:


1. Refresh(ing) / Live Table -> easy to understand and implies the dynamic
characteristic

2. Derived Table -> easy to understand.

3. Materialized Table -> sounds like just a table with physical data stored
somewhere.

4. Materialized View -> modifying a view directly is a little weird.

Thanks,

Jiangjie (Becket) Qin



On Tue, Apr 9, 2024 at 5:46 AM Lincoln Lee <[email protected]> wrote:

> Thanks Ron and Timo for your proposal!
>
> Here is my ranking:
>
> 1. Derived table -> extend the persistent semantics of derived table in SQL
>    standard, with a strong association with query, and has industry
> precedents
>    such as Google Looker.
>
> 2. Live Table ->  an alternative for 'dynamic table'
>
> 3. Materialized Table -> combination of the Materialized View and Table,
> but
>     still a table which accept data changes
>
> 4. Materialized View -> need to extend understanding of the view to accept
>     data changes
>
> The reason for not adding 'Refresh Table' is I don't want to tell the user
> to 'refresh a refresh table'.
>
>
> Best,
> Lincoln Lee
>
>
> Ron liu <[email protected]> 于2024年4月9日周二 20:11写道：
>
> > Hi, Dev
> >
> > My rankings are:
> >
> > 1. Derived Table
> > 2. Materialized Table
> > 3. Live Table
> > 4. Materialized View
> >
> > Best,
> > Ron
> >
> >
> >
> > Ron liu <[email protected]> 于2024年4月9日周二 20:07写道：
> >
> > > Hi, Dev
> > >
> > > After several rounds of discussion, there is currently no consensus on
> > the
> > > name of the new concept. Timo has proposed that we decide the name
> > through
> > > a vote. This is a good solution when there is no clear preference, so
> we
> > > will adopt this approach.
> > >
> > > Regarding the name of the new concept, there are currently five
> > candidates:
> > > 1. Derived Table -> taken by SQL standard
> > > 2. Materialized Table -> similar to SQL materialized view but a table
> > > 3. Live Table -> similar to dynamic tables
> > > 4. Refresh Table -> states what it does
> > > 5. Materialized View -> needs to extend the standard to support
> modifying
> > > data
> > >
> > > For the above five candidates, everyone can give your rankings based on
> > > your preferences. You can choose up to five options or only choose some
> > of
> > > them.
> > > We will use a scoring rule, where the* first rank gets 5 points, second
> > > rank gets 4 points, third rank gets 3 points, fourth rank gets 2
> points,
> > > and fifth rank gets 1 point*.
> > > After the voting closes, I will score all the candidates based on
> > > everyone's votes, and the candidate with the highest score will be
> chosen
> > > as the name for the new concept.
> > >
> > > The voting will last up to 72 hours and is expected to close this
> Friday.
> > > I look forward to everyone voting on the name in this thread. Of
> course,
> > we
> > > also welcome new input regarding the name.
> > >
> > > Best,
> > > Ron
> > >
> > > Ron liu <[email protected]> 于2024年4月9日周二 19:49写道：
> > >
> > >> Hi, Dev
> > >>
> > >> Sorry for my previous statement was not quite accurate. We will hold a
> > >> vote for the name within this thread.
> > >>
> > >> Best,
> > >> Ron
> > >>
> > >>
> > >> Ron liu <[email protected]> 于2024年4月9日周二 19:29写道：
> > >>
> > >>> Hi, Timo
> > >>>
> > >>> Thanks for your reply.
> > >>>
> > >>> I agree with you that sometimes naming is more difficult. When no one
> > >>> has a clear preference, voting on the name is a good solution, so
> I'll
> > send
> > >>> a separate email for the vote, clarify the rules for the vote, then
> let
> > >>> everyone vote.
> > >>>
> > >>> One other point to confirm, in your ranking there is an option for
> > >>> Materialized View, does it stand for the UPDATING Materialized View
> > that
> > >>> you mentioned earlier in the discussion? If using Materialized View I
> > think
> > >>> it is needed to extend it.
> > >>>
> > >>> Best,
> > >>> Ron
> > >>>
> > >>> Timo Walther <[email protected]> 于2024年4月9日周二 17:20写道：
> > >>>
> > >>>> Hi Ron,
> > >>>>
> > >>>> yes naming is hard. But it will have large impact on trainings,
> > >>>> presentations, and the mental model of users. Maybe the easiest is
> to
> > >>>> collect ranking by everyone with some short justification:
> > >>>>
> > >>>>
> > >>>> My ranking (from good to not so good):
> > >>>>
> > >>>> 1. Refresh Table -> states what it does
> > >>>> 2. Materialized Table -> similar to SQL materialized view but a
> table
> > >>>> 3. Live Table -> nice buzzword, but maybe still too close to dynamic
> > >>>> tables?
> > >>>> 4. Materialized View -> a bit broader than standard but still very
> > >>>> similar
> > >>>> 5. Derived table -> taken by standard
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 07.04.24 11:34, Ron liu wrote:
> > >>>> > Hi, Dev
> > >>>> >
> > >>>> > This is a summary letter. After several rounds of discussion,
> there
> > >>>> is a
> > >>>> > strong consensus about the FLIP proposal and the issues it aims to
> > >>>> address.
> > >>>> > The current point of disagreement is the naming of the new
> concept.
> > I
> > >>>> have
> > >>>> > summarized the candidates as follows:
> > >>>> >
> > >>>> > 1. Derived Table (Inspired by Google Lookers)
> > >>>> >      - Pros: Google Lookers has introduced this concept, which is
> > >>>> designed
> > >>>> > for building Looker's automated modeling, aligning with our
> purpose
> > >>>> for the
> > >>>> > stream-batch automatic pipeline.
> > >>>> >
> > >>>> >      - Cons: The SQL standard uses derived table term extensively,
> > >>>> vendors
> > >>>> > adopt this for simply referring to a table within a subclause.
> > >>>> >
> > >>>> > 2. Materialized Table: It means materialize the query result to
> > table,
> > >>>> > similar to Db2 MQT (Materialized Query Tables). In addition,
> > Snowflake
> > >>>> > Dynamic Table's predecessor is also called Materialized Table.
> > >>>> >
> > >>>> > 3. Updating Table (From Timo)
> > >>>> >
> > >>>> > 4. Updating Materialized View (From Timo)
> > >>>> >
> > >>>> > 5. Refresh/Live Table (From Martijn)
> > >>>> >
> > >>>> > As Martijn said, naming is a headache, looking forward to more
> > >>>> valuable
> > >>>> > input from everyone.
> > >>>> >
> > >>>> > [1]
> > >>>> >
> > >>>>
> >
> https://cloud.google.com/looker/docs/derived-tables#persistent_derived_tables
> > >>>> > [2]
> > >>>>
> https://www.ibm.com/docs/en/db2/11.5?topic=tables-materialized-query
> > >>>> > [3]
> > >>>> >
> > >>>>
> >
> https://community.denodo.com/docs/html/browse/6.0/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables
> > >>>> >
> > >>>> > Best,
> > >>>> > Ron
> > >>>> >
> > >>>> > Ron liu <[email protected]> 于2024年4月7日周日 15:55写道：
> > >>>> >
> > >>>> >> Hi, Lorenzo
> > >>>> >>
> > >>>> >> Thank you for your insightful input.
> > >>>> >>
> > >>>> >>>>> I think the 2 above twisted the materialized view concept to
> > more
> > >>>> than
> > >>>> >> just an optimization for accessing pre-computed
> aggregates/filters.
> > >>>> >> I think that concept (at least in my mind) is now adherent to the
> > >>>> >> semantics of the words themselves ("materialized" and "view")
> than
> > >>>> on its
> > >>>> >> implementations in DBMs, as just a view on raw data that,
> > hopefully,
> > >>>> is
> > >>>> >> constantly updated with fresh results.
> > >>>> >> That's why I understand Timo's et al. objections.
> > >>>> >>
> > >>>> >> Your understanding of Materialized Views is correct. However, in
> > our
> > >>>> >> scenario, an important feature is the support for Update & Delete
> > >>>> >> operations, which the current Materialized Views cannot fulfill.
> As
> > >>>> we
> > >>>> >> discussed with Timo before, if Materialized Views needs to
> support
> > >>>> data
> > >>>> >> modifications, it would require an extension of new keywords,
> such
> > as
> > >>>> >> CREATING xxx (UPDATING) MATERIALIZED VIEW.
> > >>>> >>
> > >>>> >>>>> Still, I don't understand why we need another type of special
> > >>>> table.
> > >>>> >> Could you dive deep into the reasons why not simply adding the
> > >>>> FRESHNESS
> > >>>> >> parameter to standard tables?
> > >>>> >>
> > >>>> >> Firstly, I need to emphasize that we cannot achieve the design
> goal
> > >>>> of
> > >>>> >> FLIP through the CREATE TABLE syntax combined with a FRESHNESS
> > >>>> parameter.
> > >>>> >> The proposal of this FLIP is to use Dynamic Table + Continuous
> > >>>> Query, and
> > >>>> >> combine it with FRESHNESS to realize a streaming-batch
> unification.
> > >>>> >> However, CREATE TABLE is merely a metadata operation and cannot
> > >>>> >> automatically start a background refresh job. To achieve the
> design
> > >>>> goal of
> > >>>> >> FLIP with standard tables, it would require extending the CTAS[1]
> > >>>> syntax to
> > >>>> >> introduce the FRESHNESS keyword. We considered this design
> > >>>> initially, but
> > >>>> >> it has following problems:
> > >>>> >>
> > >>>> >> 1. Distinguishing a table created through CTAS as a standard
> table
> > >>>> or as a
> > >>>> >> "special" standard table with an ongoing background refresh job
> > >>>> using the
> > >>>> >> FRESHNESS keyword is very obscure for users.
> > >>>> >> 2. It intrudes on the semantics of the CTAS syntax. Currently,
> > tables
> > >>>> >> created using CTAS only add table metadata to the Catalog and do
> > not
> > >>>> record
> > >>>> >> attributes such as query. There are also no ongoing background
> > >>>> refresh
> > >>>> >> jobs, and the data writing operation happens only once at table
> > >>>> creation.
> > >>>> >> 3. For the framework, when we perform a certain kind of Alter
> Table
> > >>>> >> behavior for a table, for the table created by specifying
> FRESHNESS
> > >>>> and did
> > >>>> >> not specify the FRESHNESS created table behavior how to
> distinguish
> > >>>> , which
> > >>>> >> will also cause confusion.
> > >>>> >>
> > >>>> >> In terms of the design goal of combining Dynamic Table +
> Continuous
> > >>>> Query,
> > >>>> >> the FLIP proposal cannot be realized by only extending the
> current
> > >>>> stardand
> > >>>> >> tables, so a new kind of dynamic table needs to be introduced at
> > the
> > >>>> >> first-level concept.
> > >>>> >>
> > >>>> >> [1]
> > >>>> >>
> > >>>>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement
> > >>>> >>
> > >>>> >> Best,
> > >>>> >> Ron
> > >>>> >>
> > >>>> >> <[email protected]> 于2024年4月3日周三 22:25写道：
> > >>>> >>
> > >>>> >>> Hello everybody!
> > >>>> >>> Thanks for the FLIP as it looks amazing (and I think the prove
> is
> > >>>> this
> > >>>> >>> deep discussion it is provoking :))
> > >>>> >>>
> > >>>> >>> I have a couple of comments to add to this:
> > >>>> >>>
> > >>>> >>> Even though I get the reason why you rejected MATERIALIZED
> VIEW, I
> > >>>> still
> > >>>> >>> like it a lot, and I would like to provide pointers on how the
> > >>>> materialized
> > >>>> >>> view concept twisted in last years:
> > >>>> >>>
> > >>>> >>> • Materialize DB (https://materialize.com/)
> > >>>> >>> • The famous talk by Martin Kleppmann "turning the database
> inside
> > >>>> out" (
> > >>>> >>> https://www.youtube.com/watch?v=fU9hR3kiOK0)
> > >>>> >>>
> > >>>> >>> I think the 2 above twisted the materialized view concept to
> more
> > >>>> than
> > >>>> >>> just an optimization for accessing pre-computed
> > aggregates/filters.
> > >>>> >>> I think that concept (at least in my mind) is now adherent to
> the
> > >>>> >>> semantics of the words themselves ("materialized" and "view")
> than
> > >>>> on its
> > >>>> >>> implementations in DBMs, as just a view on raw data that,
> > >>>> hopefully, is
> > >>>> >>> constantly updated with fresh results.
> > >>>> >>> That's why I understand Timo's et al. objections.
> > >>>> >>> Still I understand there is no need to add confusion :)
> > >>>> >>>
> > >>>> >>> Still, I don't understand why we need another type of special
> > table.
> > >>>> >>> Could you dive deep into the reasons why not simply adding the
> > >>>> FRESHNESS
> > >>>> >>> parameter to standard tables?
> > >>>> >>>
> > >>>> >>> I would say that as a very seamless implementation with the goal
> > of
> > >>>> a
> > >>>> >>> unification of batch and streaming.
> > >>>> >>> If we stick to a unified world, I think that Flink should just
> > >>>> provide 1
> > >>>> >>> type of table that is inherently dynamic.
> > >>>> >>> Now, depending on FRESHNESS objectives / connectors used in
> WITH,
> > >>>> that
> > >>>> >>> table can be backed by a stream or batch job as you explained in
> > >>>> your FLIP.
> > >>>> >>>
> > >>>> >>> Maybe I am totally missing the point :)
> > >>>> >>>
> > >>>> >>> Thank you in advance,
> > >>>> >>> Lorenzo
> > >>>> >>> On Apr 3, 2024 at 15:25 +0200, Martijn Visser <
> > >>>> [email protected]>,
> > >>>> >>> wrote:
> > >>>> >>>> Hi all,
> > >>>> >>>>
> > >>>> >>>> Thanks for the proposal. While the FLIP talks extensively on
> how
> > >>>> >>> Snowflake
> > >>>> >>>> has Dynamic Tables and Databricks has Delta Live Tables, my
> > >>>> >>> understanding
> > >>>> >>>> is that Databricks has CREATE STREAMING TABLE [1] which relates
> > >>>> with
> > >>>> >>> this
> > >>>> >>>> proposal.
> > >>>> >>>>
> > >>>> >>>> I do have concerns about using CREATE DYNAMIC TABLE,
> specifically
> > >>>> about
> > >>>> >>>> confusing the users who are familiar with Snowflake's approach
> > >>>> where you
> > >>>> >>>> can't change the content via DML statements, while that is
> > >>>> something
> > >>>> >>> that
> > >>>> >>>> would work in this proposal. Naming is hard of course, but I
> > would
> > >>>> >>> probably
> > >>>> >>>> prefer something like CREATE CONTINUOUS TABLE, CREATE REFRESH
> > >>>> TABLE or
> > >>>> >>>> CREATE LIVE TABLE.
> > >>>> >>>>
> > >>>> >>>> Best regards,
> > >>>> >>>>
> > >>>> >>>> Martijn
> > >>>> >>>>
> > >>>> >>>> [1]
> > >>>> >>>>
> > >>>> >>>
> > >>>>
> >
> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html
> > >>>> >>>>
> > >>>> >>>> On Wed, Apr 3, 2024 at 5:19 AM Ron liu <[email protected]>
> > wrote:
> > >>>> >>>>
> > >>>> >>>>> Hi, dev
> > >>>> >>>>>
> > >>>> >>>>> After offline discussion with Becket Qin, Lincoln Lee and Jark
> > >>>> Wu, we
> > >>>> >>> have
> > >>>> >>>>> improved some parts of the FLIP.
> > >>>> >>>>>
> > >>>> >>>>> 1. Add Full Refresh Mode section to clarify the semantics of
> > full
> > >>>> >>> refresh
> > >>>> >>>>> mode.
> > >>>> >>>>> 2. Add Future Improvement section explaining why query
> statement
> > >>>> does
> > >>>> >>> not
> > >>>> >>>>> support references to temporary view and possible solutions.
> > >>>> >>>>> 3. The Future Improvement section explains a possible future
> > >>>> solution
> > >>>> >>> for
> > >>>> >>>>> dynamic table to support the modification of query statements
> to
> > >>>> meet
> > >>>> >>> the
> > >>>> >>>>> common field-level schema evolution requirements of the
> > lakehouse.
> > >>>> >>>>> 4. The Refresh section emphasizes that the Refresh command and
> > the
> > >>>> >>>>> background refresh job can be executed in parallel, with no
> > >>>> >>> restrictions at
> > >>>> >>>>> the framework level.
> > >>>> >>>>> 5. Convert RefreshHandler into a plug-in interface to support
> > >>>> various
> > >>>> >>>>> workflow schedulers.
> > >>>> >>>>>
> > >>>> >>>>> Best,
> > >>>> >>>>> Ron
> > >>>> >>>>>
> > >>>> >>>>> Ron liu <[email protected]> 于2024年4月2日周二 10:28写道：
> > >>>> >>>>>
> > >>>> >>>>>>> Hi, Venkata krishnan
> > >>>> >>>>>>>
> > >>>> >>>>>>> Thank you for your involvement and suggestions, and hope
> that
> > >>>> the
> > >>>> >>> design
> > >>>> >>>>>>> goals of this FLIP will be helpful to your business.
> > >>>> >>>>>>>
> > >>>> >>>>>>>>>>>>> 1. In the proposed FLIP, given the example for the
> > >>>> >>> dynamic table, do
> > >>>> >>>>>>> the
> > >>>> >>>>>>> data sources always come from a single lake storage such as
> > >>>> >>> Paimon or
> > >>>> >>>>> does
> > >>>> >>>>>>> the same proposal solve for 2 disparate storage systems like
> > >>>> >>> Kafka and
> > >>>> >>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
> > >>>> Paimon?
> > >>>> >>>>>>> Basically the lambda architecture that is mentioned in the
> > FLIP
> > >>>> >>> as well.
> > >>>> >>>>>>> I'm wondering if it is possible to switch b/w sources based
> on
> > >>>> the
> > >>>> >>>>>>> execution mode, for eg: if it is backfill operation, switch
> > to a
> > >>>> >>> data
> > >>>> >>>>> lake
> > >>>> >>>>>>> storage system like Iceberg, otherwise an event streaming
> > system
> > >>>> >>> like
> > >>>> >>>>>>> Kafka.
> > >>>> >>>>>>>
> > >>>> >>>>>>> Dynamic table is a design abstraction at the framework level
> > and
> > >>>> >>> is not
> > >>>> >>>>>>> tied to the physical implementation of the connector. If a
> > >>>> >>> connector
> > >>>> >>>>>>> supports a combination of Kafka and lake storage, this works
> > >>>> fine.
> > >>>> >>>>>>>
> > >>>> >>>>>>>>>>>>> 2. What happens in the context of a bootstrap (batch)
> +
> > >>>> >>> nearline
> > >>>> >>>>> update
> > >>>> >>>>>>> (streaming) case that are stateful applications? What I mean
> > by
> > >>>> >>> that is,
> > >>>> >>>>>>> will the state from the batch application be transferred to
> > the
> > >>>> >>> nearline
> > >>>> >>>>>>> application after the bootstrap execution is complete?
> > >>>> >>>>>>>
> > >>>> >>>>>>> I think this is another orthogonal thing, something that
> > >>>> FLIP-327
> > >>>> >>> tries
> > >>>> >>>>> to
> > >>>> >>>>>>> address, not directly related to Dynamic Table.
> > >>>> >>>>>>>
> > >>>> >>>>>>> [1]
> > >>>> >>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data
> > >>>> >>>>>>>
> > >>>> >>>>>>> Best,
> > >>>> >>>>>>> Ron
> > >>>> >>>>>>>
> > >>>> >>>>>>> Venkatakrishnan Sowrirajan <[email protected]> 于2024年3月30日周六
> > >>>> >>> 07:06写道：
> > >>>> >>>>>>>
> > >>>> >>>>>>>>> Ron and Lincoln,
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Great proposal and interesting discussion for adding
> support
> > >>>> >>> for dynamic
> > >>>> >>>>>>>>> tables within Flink.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> At LinkedIn, we are also trying to solve compute/storage
> > >>>> >>> convergence for
> > >>>> >>>>>>>>> similar problems discussed as part of this FLIP,
> > specifically
> > >>>> >>> periodic
> > >>>> >>>>>>>>> backfill, bootstrap + nearline update use cases using
> single
> > >>>> >>>>>>>>> implementation
> > >>>> >>>>>>>>> of business logic (single script).
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Few clarifying questions:
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> 1. In the proposed FLIP, given the example for the dynamic
> > >>>> >>> table, do the
> > >>>> >>>>>>>>> data sources always come from a single lake storage such
> as
> > >>>> >>> Paimon or
> > >>>> >>>>> does
> > >>>> >>>>>>>>> the same proposal solve for 2 disparate storage systems
> like
> > >>>> >>> Kafka and
> > >>>> >>>>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
> > >>>> >>> Paimon?
> > >>>> >>>>>>>>> Basically the lambda architecture that is mentioned in the
> > >>>> >>> FLIP as well.
> > >>>> >>>>>>>>> I'm wondering if it is possible to switch b/w sources
> based
> > on
> > >>>> >>> the
> > >>>> >>>>>>>>> execution mode, for eg: if it is backfill operation,
> switch
> > to
> > >>>> >>> a data
> > >>>> >>>>> lake
> > >>>> >>>>>>>>> storage system like Iceberg, otherwise an event streaming
> > >>>> >>> system like
> > >>>> >>>>>>>>> Kafka.
> > >>>> >>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
> > >>>> >>> nearline update
> > >>>> >>>>>>>>> (streaming) case that are stateful applications? What I
> mean
> > >>>> >>> by that is,
> > >>>> >>>>>>>>> will the state from the batch application be transferred
> to
> > >>>> >>> the nearline
> > >>>> >>>>>>>>> application after the bootstrap execution is complete?
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Regards
> > >>>> >>>>>>>>> Venkata krishnan
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> On Mon, Mar 25, 2024 at 8:03 PM Ron liu <
> [email protected]
> > >
> > >>>> >>> wrote:
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>>> Hi, Timo
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Thanks for your quick response, and your suggestion.
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Yes, this discussion has turned into confirming whether
> > >>>> >>> it's a special
> > >>>> >>>>>>>>>>> table or a special MV.
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> 1. The key problem with MVs is that they don't support
> > >>>> >>> modification,
> > >>>> >>>>> so
> > >>>> >>>>>>>>> I
> > >>>> >>>>>>>>>>> prefer it to be a special table. Although the periodic
> > >>>> >>> refresh
> > >>>> >>>>> behavior
> > >>>> >>>>>>>>> is
> > >>>> >>>>>>>>>>> more characteristic of an MV, since we are already a
> > >>>> >>> special table,
> > >>>> >>>>>>>>>>> supporting periodic refresh behavior is quite natural,
> > >>>> >>> similar to
> > >>>> >>>>>>>>> Snowflake
> > >>>> >>>>>>>>>>> dynamic tables.
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> 2. Regarding the keyword UPDATING, since the current
> > >>>> >>> Regular Table is
> > >>>> >>>>> a
> > >>>> >>>>>>>>>>> Dynamic Table, which implies support for updating
> through
> > >>>> >>> Continuous
> > >>>> >>>>>>>>> Query,
> > >>>> >>>>>>>>>>> I think it is redundant to add the keyword UPDATING. In
> > >>>> >>> addition,
> > >>>> >>>>>>>>> UPDATING
> > >>>> >>>>>>>>>>> can not reflect the Continuous Query part, can not
> express
> > >>>> >>> the purpose
> > >>>> >>>>>>>>> we
> > >>>> >>>>>>>>>>> want to simplify the data pipeline through Dynamic
> Table +
> > >>>> >>> Continuous
> > >>>> >>>>>>>>>>> Query.
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> 3. From the perspective of the SQL standard definition,
> I
> > >>>> >>> can
> > >>>> >>>>> understand
> > >>>> >>>>>>>>>>> your concerns about Derived Table, but is it possible to
> > >>>> >>> make a slight
> > >>>> >>>>>>>>>>> adjustment to meet our needs? Additionally, as Lincoln
> > >>>> >>> mentioned, the
> > >>>> >>>>>>>>>>> Google Looker platform has introduced Persistent Derived
> > >>>> >>> Table, and
> > >>>> >>>>>>>>> there
> > >>>> >>>>>>>>>>> are precedents in the industry; could Derived Table be a
> > >>>> >>> candidate?
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Of course, look forward to your better suggestions.
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>> Ron
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Timo Walther <[email protected]> 于2024年3月25日周一
> 18:49写道：
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> After thinking about this more, this discussion boils
> > >>>> >>> down to
> > >>>> >>>>> whether
> > >>>> >>>>>>>>>>>>> this is a special table or a special materialized
> > >>>> >>> view. In both
> > >>>> >>>>> cases,
> > >>>> >>>>>>>>>>>>> we would need to add a special keyword:
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Either
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> CREATE UPDATING TABLE
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> or
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> CREATE UPDATING MATERIALIZED VIEW
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> I still feel that the periodic refreshing behavior is
> > >>>> >>> closer to a
> > >>>> >>>>> MV.
> > >>>> >>>>>>>>> If
> > >>>> >>>>>>>>>>>>> we add a special keyword to MV, the optimizer would
> > >>>> >>> know that the
> > >>>> >>>>> data
> > >>>> >>>>>>>>>>>>> cannot be used for query optimizations.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> I will ask more people for their opinion.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Regards,
> > >>>> >>>>>>>>>>>>> Timo
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> On 25.03.24 10:45, Timo Walther wrote:
> > >>>> >>>>>>>>>>>>>>> Hi Ron and Lincoln,
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> thanks for the quick response and the very
> > >>>> >>> insightful discussion.
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> we might limit future opportunities to
> > >>>> >>> optimize queries
> > >>>> >>>>>>>>>>>>>>>>> through automatic materialization rewriting by
> > >>>> >>> allowing data
> > >>>> >>>>>>>>>>>>>>>>> modifications, thus losing the potential for
> > >>>> >>> such
> > >>>> >>>>> optimizations.
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> This argument makes a lot of sense to me. Due to
> > >>>> >>> the updates, the
> > >>>> >>>>>>>>>>> system
> > >>>> >>>>>>>>>>>>>>> is not in full control of the persisted data.
> > >>>> >>> However, the system
> > >>>> >>>>> is
> > >>>> >>>>>>>>>>>>>>> still in full control of the job that powers the
> > >>>> >>> refresh. So if
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>> system manages all updating pipelines, it could
> > >>>> >>> still leverage
> > >>>> >>>>>>>>>>> automatic
> > >>>> >>>>>>>>>>>>>>> materialization rewriting but without leveraging
> > >>>> >>> the data at rest
> > >>>> >>>>>>>>> (only
> > >>>> >>>>>>>>>>>>>>> the data in flight).
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> we are considering another candidate, Derived
> > >>>> >>> Table, the term
> > >>>> >>>>>>>>>>> 'derive'
> > >>>> >>>>>>>>>>>>>>>>> suggests a query, and 'table' retains
> > >>>> >>> modifiability. This
> > >>>> >>>>>>>>> approach
> > >>>> >>>>>>>>>>>>>>>>> would not disrupt our current concept of a
> > >>>> >>> dynamic table
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> I did some research on this term. The SQL standard
> > >>>> >>> uses the term
> > >>>> >>>>>>>>>>>>>>> "derived table" extensively (defined in section
> > >>>> >>> 4.17.3). Thus, a
> > >>>> >>>>>>>>> lot of
> > >>>> >>>>>>>>>>>>>>> vendors adopt this for simply referring to a table
> > >>>> >>> within a
> > >>>> >>>>>>>>> subclause:
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/derived-tables.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghdiMp$
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32300.1600/doc/html/san1390612291252.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737h1gRux$
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://www.c-sharpcorner.com/article/derived-tables-vs-common-table-expressions/__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739bWIEcL$
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://stackoverflow.com/questions/26529804/what-are-the-derived-tables-in-my-explain-statement__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739HnGtQf$
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://www.sqlservercentral.com/articles/sql-derived-tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737DeBiqg$
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> Esp. the latter example is interesting, SQL Server
> > >>>> >>> allows things
> > >>>> >>>>>>>>> like
> > >>>> >>>>>>>>>>>>>>> this on derived tables:
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> UPDATE T SET Name='Timo' FROM (SELECT * FROM
> > >>>> >>> Product) AS T
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> SELECT * FROM Product;
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> Btw also Snowflake's dynamic table state:
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Because the content of a dynamic table is
> > >>>> >>> fully determined
> > >>>> >>>>>>>>>>>>>>>>> by the given query, the content cannot be
> > >>>> >>> changed by using DML.
> > >>>> >>>>>>>>>>>>>>>>> You don’t insert, update, or delete the rows
> > >>>> >>> in a dynamic
> > >>>> >>>>> table.
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> So a new term makes a lot of sense.
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> How about using `UPDATING`?
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> CREATE UPDATING TABLE
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> This reflects that modifications can be made and
> > >>>> >>> from an
> > >>>> >>>>>>>>>>>>>>> English-language perspective you can PAUSE or
> > >>>> >>> RESUME the UPDATING.
> > >>>> >>>>>>>>>>>>>>> Thus, a user can define UPDATING interval and mode?
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> Looking forward to your thoughts.
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> Regards,
> > >>>> >>>>>>>>>>>>>>> Timo
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>> On 25.03.24 07:09, Ron liu wrote:
> > >>>> >>>>>>>>>>>>>>>>> Hi, Ahmed
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Thanks for your feedback.
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Regarding your question:
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> I want to iterate on Timo's comments
> > >>>> >>> regarding the confusion
> > >>>> >>>>>>>>> between
> > >>>> >>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink "Table".
> > >>>> >>> Should the refactoring
> > >>>> >>>>>>>>> of
> > >>>> >>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it in
> > >>>> >>> this Flip ( as the
> > >>>> >>>>>>>>>>>>>>>>> suggestions
> > >>>> >>>>>>>>>>>>>>>>> in the thread ) and address the holistic
> > >>>> >>> changes in a separate
> > >>>> >>>>> Flip
> > >>>> >>>>>>>>>>>>>>>>> for 2.0?
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Lincoln proposed a new concept in reply to
> > >>>> >>> Timo: Derived Table,
> > >>>> >>>>>>>>> which
> > >>>> >>>>>>>>>>>>>>>>> is a
> > >>>> >>>>>>>>>>>>>>>>> combination of Dynamic Table + Continuous
> > >>>> >>> Query, and the use of
> > >>>> >>>>>>>>>>> Derived
> > >>>> >>>>>>>>>>>>>>>>> Table will not conflict with existing concepts,
> > >>>> >>> what do you
> > >>>> >>>>> think?
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> I feel confused with how it is further with
> > >>>> >>> other components,
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>>>> examples provided feel like a standalone ETL
> > >>>> >>> job, could you
> > >>>> >>>>>>>>> provide in
> > >>>> >>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>> FLIP an example where the table is further used
> > >>>> >>> in subsequent
> > >>>> >>>>>>>>> queries
> > >>>> >>>>>>>>>>>>>>>>> (specially in batch mode).
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Thanks for your suggestion, I added how to use
> > >>>> >>> Dynamic Table in
> > >>>> >>>>>>>>> FLIP
> > >>>> >>>>>>>>>>>>> user
> > >>>> >>>>>>>>>>>>>>>>> story section, Dynamic Table can be referenced
> > >>>> >>> by downstream
> > >>>> >>>>>>>>> Dynamic
> > >>>> >>>>>>>>>>>>>>>>> Table
> > >>>> >>>>>>>>>>>>>>>>> and can also support OLAP queries.
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>> Ron
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>> Ron liu <[email protected]> 于2024年3月23日周六
> > >>>> >>> 10:35写道：
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Hi, Feng
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Thanks for your feedback.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> Although currently we restrict users from
> > >>>> >>> modifying the query,
> > >>>> >>>>> I
> > >>>> >>>>>>>>>>>>> wonder
> > >>>> >>>>>>>>>>>>>>>>>>> if
> > >>>> >>>>>>>>>>>>>>>>>>> we can provide a better way to help users
> > >>>> >>> rebuild it without
> > >>>> >>>>>>>>>>> affecting
> > >>>> >>>>>>>>>>>>>>>>>>> downstream OLAP queries.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Considering the problem of data consistency,
> > >>>> >>> so in the first
> > >>>> >>>>> step
> > >>>> >>>>>>>>> we
> > >>>> >>>>>>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>> strictly limited in semantics and do not
> > >>>> >>> support modify the
> > >>>> >>>>> query.
> > >>>> >>>>>>>>>>>>>>>>>>> This is
> > >>>> >>>>>>>>>>>>>>>>>>> really a good problem, one of my ideas is to
> > >>>> >>> introduce a syntax
> > >>>> >>>>>>>>>>>>>>>>>>> similar to
> > >>>> >>>>>>>>>>>>>>>>>>> SWAP [1], which supports exchanging two
> > >>>> >>> Dynamic Tables.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>  From the documentation, the definitions
> > >>>> >>> SQL and job
> > >>>> >>>>> information
> > >>>> >>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this mean that
> > >>>> >>> if a system needs to
> > >>>> >>>>>>>>> adapt
> > >>>> >>>>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to store
> > >>>> >>> Flink's job information
> > >>>> >>>>> in
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>> corresponding system?
> > >>>> >>>>>>>>>>>>>>>>>>> For example, does MySQL's Catalog need to
> > >>>> >>> store flink job
> > >>>> >>>>>>>>> information
> > >>>> >>>>>>>>>>>>> as
> > >>>> >>>>>>>>>>>>>>>>>>> well?
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Yes, currently we need to rely on Catalog to
> > >>>> >>> store refresh job
> > >>>> >>>>>>>>>>>>>>>>>>> information.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> Users still need to consider how much
> > >>>> >>> memory is being used, how
> > >>>> >>>>>>>>>>> large
> > >>>> >>>>>>>>>>>>>>>>>>> the concurrency is, which type of state
> > >>>> >>> backend is being used,
> > >>>> >>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>> may need
> > >>>> >>>>>>>>>>>>>>>>>>> to set TTL expiration.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Similar to the current practice, job
> > >>>> >>> parameters can be set via
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>> Flink
> > >>>> >>>>>>>>>>>>>>>>>>> conf or SET commands
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> When we submit a refresh command, can we
> > >>>> >>> help users detect if
> > >>>> >>>>>>>>> there
> > >>>> >>>>>>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>> any
> > >>>> >>>>>>>>>>>>>>>>>>> running jobs and automatically stop them
> > >>>> >>> before executing the
> > >>>> >>>>>>>>> refresh
> > >>>> >>>>>>>>>>>>>>>>>>> command? Then wait for it to complete before
> > >>>> >>> restarting the
> > >>>> >>>>>>>>>>> background
> > >>>> >>>>>>>>>>>>>>>>>>> streaming job?
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Purely from a technical implementation point
> > >>>> >>> of view, your
> > >>>> >>>>>>>>> proposal
> > >>>> >>>>>>>>>>> is
> > >>>> >>>>>>>>>>>>>>>>>>> doable, but it would be more costly. Also I
> > >>>> >>> think data
> > >>>> >>>>> consistency
> > >>>> >>>>>>>>>>>>>>>>>>> itself
> > >>>> >>>>>>>>>>>>>>>>>>> is the responsibility of the user, similar
> > >>>> >>> to how Regular Table
> > >>>> >>>>> is
> > >>>> >>>>>>>>>>>>>>>>>>> now also
> > >>>> >>>>>>>>>>>>>>>>>>> the responsibility of the user, so it's
> > >>>> >>> consistent with its
> > >>>> >>>>>>>>> behavior
> > >>>> >>>>>>>>>>>>>>>>>>> and no
> > >>>> >>>>>>>>>>>>>>>>>>> additional guarantees are made at the engine
> > >>>> >>> level.
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>> Ron
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>> Ahmed Hamdy <[email protected]>
> > >>>> >>> 于2024年3月22日周五 23:50写道：
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> Hi Ron,
> > >>>> >>>>>>>>>>>>>>>>>>>>> Sorry for joining the discussion late,
> > >>>> >>> thanks for the effort.
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> I think the base idea is great, however I
> > >>>> >>> have a couple of
> > >>>> >>>>>>>>> comments:
> > >>>> >>>>>>>>>>>>>>>>>>>>> - I want to iterate on Timo's comments
> > >>>> >>> regarding the confusion
> > >>>> >>>>>>>>>>> between
> > >>>> >>>>>>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink
> > >>>> >>> "Table". Should the
> > >>>> >>>>>>>>> refactoring of
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it
> > >>>> >>> in this Flip ( as the
> > >>>> >>>>>>>>>>>>>>>>>>>>> suggestions
> > >>>> >>>>>>>>>>>>>>>>>>>>> in the thread ) and address the holistic
> > >>>> >>> changes in a separate
> > >>>> >>>>>>>>> Flip
> > >>>> >>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>> 2.0?
> > >>>> >>>>>>>>>>>>>>>>>>>>> - I feel confused with how it is further
> > >>>> >>> with other components,
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>> examples provided feel like a standalone
> > >>>> >>> ETL job, could you
> > >>>> >>>>>>>>> provide
> > >>>> >>>>>>>>>>>>>>>>>>>>> in the
> > >>>> >>>>>>>>>>>>>>>>>>>>> FLIP an example where the table is
> > >>>> >>> further used in subsequent
> > >>>> >>>>>>>>>>> queries
> > >>>> >>>>>>>>>>>>>>>>>>>>> (specially in batch mode).
> > >>>> >>>>>>>>>>>>>>>>>>>>> - I really like the standard of keeping
> > >>>> >>> the unified batch and
> > >>>> >>>>>>>>>>>>> streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>> approach
> > >>>> >>>>>>>>>>>>>>>>>>>>> Best Regards
> > >>>> >>>>>>>>>>>>>>>>>>>>> Ahmed Hamdy
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>> On Fri, 22 Mar 2024 at 12:07, Lincoln Lee
> > >>>> >>> <
> > >>>> >>>>>>>>> [email protected]>
> > >>>> >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Timo,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your thoughtful inputs!
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Yes, expanding the MATERIALIZED
> > >>>> >>> VIEW(MV) could achieve the
> > >>>> >>>>> same
> > >>>> >>>>>>>>>>>>>>>>>>>>> function,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> but our primary concern is that by
> > >>>> >>> using a view, we might
> > >>>> >>>>> limit
> > >>>> >>>>>>>>>>>>> future
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> opportunities
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> to optimize queries through automatic
> > >>>> >>> materialization
> > >>>> >>>>> rewriting
> > >>>> >>>>>>>>>>> [1],
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> leveraging
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> the support for MV by physical
> > >>>> >>> storage. This is because we
> > >>>> >>>>>>>>> would be
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> breaking
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> the intuitive semantics of a
> > >>>> >>> materialized view (a materialized
> > >>>> >>>>>>>>> view
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> represents
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> the result of a query) by allowing
> > >>>> >>> data modifications, thus
> > >>>> >>>>>>>>> losing
> > >>>> >>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> potential
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> for such optimizations.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> With these considerations in mind, we
> > >>>> >>> were inspired by Google
> > >>>> >>>>>>>>>>>>> Looker's
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Persistent
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Derived Table [2]. PDT is designed for
> > >>>> >>> building Looker's
> > >>>> >>>>>>>>> automated
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> modeling,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> aligning with our purpose for the
> > >>>> >>> stream-batch automatic
> > >>>> >>>>>>>>> pipeline.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Therefore,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> we are considering another candidate,
> > >>>> >>> Derived Table, the term
> > >>>> >>>>>>>>>>>>> 'derive'
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> suggests a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> query, and 'table' retains
> > >>>> >>> modifiability. This approach would
> > >>>> >>>>>>>>> not
> > >>>> >>>>>>>>>>>>>>>>>>>>> disrupt
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> our current
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> concept of a dynamic table, preserving
> > >>>> >>> the future utility of
> > >>>> >>>>>>>>> MVs.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Conceptually, a Derived Table is a
> > >>>> >>> Dynamic Table + Continuous
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Query. By
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> introducing
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> a new concept Derived Table for this
> > >>>> >>> FLIP, this makes all
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> concepts to
> > >>>> >>>>>>>>>>>>>>>>>>>>> play
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> together nicely.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> What do you think about this?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73_NFf4D5$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> [2]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://cloud.google.com/looker/docs/derived-tables*persistent_derived_tables__;Iw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7382-2zI3$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Timo Walther <[email protected]>
> > >>>> >>> 于2024年3月22日周五 17:54写道：
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> thanks for the detailed answer.
> > >>>> >>> Sorry, for my late reply, we
> > >>>> >>>>>>>>> had a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> conference that kept me busy.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the current concept[1], it
> > >>>> >>> actually includes: Dynamic
> > >>>> >>>>>>>>>>> Tables
> > >>>> >>>>>>>>>>>>> &
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> & Continuous Query. Dynamic
> > >>>> >>> Table is just an abstract
> > >>>> >>>>>>>>> logical
> > >>>> >>>>>>>>>>>>>>>>>>>>> concept
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> This explanation makes sense to me.
> > >>>> >>> But the docs also say "A
> > >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> query is evaluated on the dynamic
> > >>>> >>> table yielding a new
> > >>>> >>>>> dynamic
> > >>>> >>>>>>>>>>>>>>>>>>>>> table.".
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> So even our regular CREATE TABLEs
> > >>>> >>> are considered dynamic
> > >>>> >>>>>>>>> tables.
> > >>>> >>>>>>>>>>>>> This
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> can also be seen in the diagram
> > >>>> >>> "Dynamic Table -> Continuous
> > >>>> >>>>>>>>> Query
> > >>>> >>>>>>>>>>>>> ->
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table". Currently, Flink
> > >>>> >>> queries can only be executed
> > >>>> >>>>>>>>> on
> > >>>> >>>>>>>>>>>>>>>>>>>>> Dynamic
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Tables.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In essence, a materialized view
> > >>>> >>> represents the result of
> > >>>> >>>>> a
> > >>>> >>>>>>>>>>>>> query.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Isn't that what your proposal does
> > >>>> >>> as well?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the object of the suspend
> > >>>> >>> operation is the refresh task
> > >>>> >>>>> of
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> dynamic table
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that Snowflake uses
> > >>>> >>> the term [1] to merge their
> > >>>> >>>>>>>>>>>>> concepts
> > >>>> >>>>>>>>>>>>>>>>>>>>> of
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> STREAM, TASK, and TABLE into one
> > >>>> >>> piece of concept. But Flink
> > >>>> >>>>>>>>> has
> > >>>> >>>>>>>>>>> no
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> concept of a "refresh task". Also,
> > >>>> >>> they already introduced
> > >>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> VIEW. Flink is in the convenient
> > >>>> >>> position that the concept of
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> materialized views is not taken
> > >>>> >>> (reserved maybe for exactly
> > >>>> >>>>>>>>> this
> > >>>> >>>>>>>>>>> use
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> case?). And SQL standard concept
> > >>>> >>> could be "slightly adapted"
> > >>>> >>>>> to
> > >>>> >>>>>>>>>>> our
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> needs. Looking at other vendors
> > >>>> >>> like Postgres[2], they also
> > >>>> >>>>> use
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> `REFRESH` commands so why not
> > >>>> >>> adding additional commands such
> > >>>> >>>>>>>>> as
> > >>>> >>>>>>>>>>>>>>>>>>>>> DELETE
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> or UPDATE. Oracle supports "ON
> > >>>> >>> PREBUILT TABLE clause tells
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>> database
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> to use an existing table
> > >>>> >>> segment"[3] which comes closer to
> > >>>> >>>>>>>>> what we
> > >>>> >>>>>>>>>>>>>>>>>>>>> want
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> as well.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> it is not intended to support
> > >>>> >>> data modification
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> This is an argument that I
> > >>>> >>> understand. But we as Flink could
> > >>>> >>>>>>>>> allow
> > >>>> >>>>>>>>>>>>>>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> modifications. This way we are only
> > >>>> >>> extending the standard
> > >>>> >>>>> and
> > >>>> >>>>>>>>>>> don't
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce new concepts.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> If we can't agree on using
> > >>>> >>> MATERIALIZED VIEW concept. We
> > >>>> >>>>> should
> > >>>> >>>>>>>>>>> fix
> > >>>> >>>>>>>>>>>>>>>>>>>>> our
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> syntax in a Flink 2.0 effort.
> > >>>> >>> Making regular tables bounded
> > >>>> >>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>> dynamic
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> tables unbounded. We would be
> > >>>> >>> closer to the SQL standard with
> > >>>> >>>>>>>>> this
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> pave the way for the future. I
> > >>>> >>> would actually support this if
> > >>>> >>>>>>>>> all
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> concepts play together nicely.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the future, we can consider
> > >>>> >>> extending the statement
> > >>>> >>>>> set
> > >>>> >>>>>>>>>>>>> syntax
> > >>>> >>>>>>>>>>>>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> support the creation of multiple
> > >>>> >>> dynamic tables.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> It's good that we called the
> > >>>> >>> concept STATEMENT SET. This
> > >>>> >>>>>>>>> allows us
> > >>>> >>>>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> defined CREATE TABLE within. Even
> > >>>> >>> if it might look a bit
> > >>>> >>>>>>>>>>> confusing.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Timo
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-creatematerializedview.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zbNhvS7$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [3]
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://oracle-base.com/articles/misc/materialized-views__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739xS1kvD$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.24 04:14, Feng Jin wrote:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron and Lincoln
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
> > >>>> >>> discussion. I believe it will
> > >>>> >>>>> greatly
> > >>>> >>>>>>>>>>>>>>>>>>>>> improve
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> convenience of managing user
> > >>>> >>> real-time pipelines.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have some questions.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding Limitations of
> > >>>> >>> Dynamic Table:*
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does not support modifying
> > >>>> >>> the select statement after the
> > >>>> >>>>>>>>>>> dynamic
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> table
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is created.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Although currently we restrict
> > >>>> >>> users from modifying the
> > >>>> >>>>>>>>> query, I
> > >>>> >>>>>>>>>>>>>>>>>>>>> wonder
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> if
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can provide a better way to
> > >>>> >>> help users rebuild it without
> > >>>> >>>>>>>>>>>>>>>>>>>>> affecting
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> downstream OLAP queries.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the management of
> > >>>> >>> background jobs:*
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. From the documentation, the
> > >>>> >>> definitions SQL and job
> > >>>> >>>>>>>>>>> information
> > >>>> >>>>>>>>>>>>>>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this
> > >>>> >>> mean that if a system needs
> > >>>> >>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>> adapt
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to
> > >>>> >>> store Flink's job
> > >>>> >>>>>>>>> information in
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding system?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, does MySQL's
> > >>>> >>> Catalog need to store flink job
> > >>>> >>>>>>>>>>>>>>>>>>>>> information
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> as
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> well?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Users still need to consider
> > >>>> >>> how much memory is being
> > >>>> >>>>> used,
> > >>>> >>>>>>>>>>> how
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> large
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the concurrency is, which type
> > >>>> >>> of state backend is being
> > >>>> >>>>> used,
> > >>>> >>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>> may
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> need
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to set TTL expiration.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the Refresh Part:*
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the refresh mode is
> > >>>> >>> continuous and a background job is
> > >>>> >>>>>>>>>>> running,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> caution should be taken with the
> > >>>> >>> refresh command as it can
> > >>>> >>>>>>>>> lead
> > >>>> >>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent data.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> When we submit a refresh
> > >>>> >>> command, can we help users detect
> > >>>> >>>>> if
> > >>>> >>>>>>>>>>> there
> > >>>> >>>>>>>>>>>>>>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> any
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> running jobs and automatically
> > >>>> >>> stop them before executing
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>> refresh
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> command? Then wait for it to
> > >>>> >>> complete before restarting the
> > >>>> >>>>>>>>>>>>>>>>>>>>> background
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming job?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Feng
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 19, 2024 at 9:40 PM
> > >>>> >>> Lincoln Lee <
> > >>>> >>>>>>>>>>>>> [email protected]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yun,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your
> > >>>> >>> valuable input!
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Incremental mode is indeed an
> > >>>> >>> attractive idea, we have also
> > >>>> >>>>>>>>>>>>>>>>>>>>> discussed
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, but in the current
> > >>>> >>> design,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we first provided two refresh
> > >>>> >>> modes: CONTINUOUS and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FULL. Incremental mode can be
> > >>>> >>> introduced
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> once the execution layer has
> > >>>> >>> the capability.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My answer for the two
> > >>>> >>> questions:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, cascading is a good
> > >>>> >>> question. Current proposal
> > >>>> >>>>>>>>> provides a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness that defines a
> > >>>> >>> dynamic
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table relative to the base
> > >>>> >>> table’s lag. If users need to
> > >>>> >>>>>>>>>>> consider
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-to-end freshness of
> > >>>> >>> multiple
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cascaded dynamic tables, he
> > >>>> >>> can manually split them for
> > >>>> >>>>> now.
> > >>>> >>>>>>>>> Of
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> course, how to let multiple
> > >>>> >>> cascaded
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or dependent dynamic tables
> > >>>> >>> complete the freshness
> > >>>> >>>>>>>>> definition
> > >>>> >>>>>>>>>>>>> in
> > >>>> >>>>>>>>>>>>>>>>>>>>> a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simpler way, I think it can be
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extended in the future.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cascading refresh is also a
> > >>>> >>> part we focus on discussing. In
> > >>>> >>>>>>>>> this
> > >>>> >>>>>>>>>>>>>>>>>>>>> flip,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we hope to focus as much as
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible on the core features
> > >>>> >>> (as it already involves a lot
> > >>>> >>>>>>>>>>>>>>>>>>>>> things),
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so we did not directly
> > >>>> >>> introduce related
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax. However, based on the
> > >>>> >>> current design, combined
> > >>>> >>>>>>>>> with
> > >>>> >>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog and lineage,
> > >>>> >>> theoretically,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users can also finish the
> > >>>> >>> cascading refresh.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang <[email protected]>
> > >>>> >>> 于2024年3月19日周二 13:45写道：
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
> > >>>> >>> discussion, and I am so excited to
> > >>>> >>>>>>>>> see
> > >>>> >>>>>>>>>>>>>>>>>>>>> this
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> topic
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being discussed in the
> > >>>> >>> Flink community!
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  From my point of view,
> > >>>> >>> instead of the work of unifying
> > >>>> >>>>>>>>>>>>> streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in DataStream API [1],
> > >>>> >>> this FLIP actually could make users
> > >>>> >>>>>>>>>>>>> benefit
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> from
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine to rule batch &
> > >>>> >>> streaming.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we treat this FLIP as
> > >>>> >>> an open-source implementation of
> > >>>> >>>>>>>>>>>>>>>>>>>>> Snowflake's
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic tables [2], we
> > >>>> >>> still lack an incremental refresh
> > >>>> >>>>>>>>> mode
> > >>>> >>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>> make
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ETL near real-time with a
> > >>>> >>> much cheaper computation cost.
> > >>>> >>>>>>>>>>> However,
> > >>>> >>>>>>>>>>>>>>>>>>>>> I
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> think
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this could be done under
> > >>>> >>> the current design by introducing
> > >>>> >>>>>>>>>>>>> another
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mode in the future.
> > >>>> >>> Although the extra work of incremental
> > >>>> >>>>>>>>> view
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> maintenance
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be much larger.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the FLIP itself, I
> > >>>> >>> have several questions below:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. It seems this FLIP does
> > >>>> >>> not consider the lag of
> > >>>> >>>>> refreshes
> > >>>> >>>>>>>>>>>>>>>>>>>>> across
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> ETL
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layers from ODS ---> DWD
> > >>>> >>> ---> APP [3]. We currently only
> > >>>> >>>>>>>>>>> consider
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler interval, which
> > >>>> >>> means we cannot use lag to
> > >>>> >>>>>>>>>>>>> automatically
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> schedule
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the upfront micro-batch
> > >>>> >>> jobs to do the work.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. To support the
> > >>>> >>> automagical refreshes, we should
> > >>>> >>>>> consider
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> lineage
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the catalog or somewhere
> > >>>> >>> else.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-134*3A*Batch*execution*for*the*DataStream*API__;JSsrKysrKw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7352JICzI$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghqpxk$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>> ________________________________
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Lincoln Lee <
> > >>>> >>> [email protected]>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, March 14,
> > >>>> >>> 2024 14:35
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: [email protected] <
> > >>>> >>> [email protected]>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS]
> > >>>> >>> FLIP-435: Introduce a New Dynamic
> > >>>> >>>>>>>>> Table
> > >>>> >>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simplifying Data Pipelines
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jing,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your attention
> > >>>> >>> to this flip! I'll try to answer
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> following
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define query
> > >>>> >>> of dynamic table?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
> > >>>> >>> introducing new syntax?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, how
> > >>>> >>> to handle the difference in SQL
> > >>>> >>>>>>>>> between
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
> > >>>> >>> including window aggregate based on
> > >>>> >>>>>>>>>>>>>>>>>>>>> processing
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
> > >>>> >>> global order by?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Similar to `CREATE TABLE
> > >>>> >>> AS query`, here the `query` also
> > >>>> >>>>>>>>> uses
> > >>>> >>>>>>>>>>>>>>>>>>>>> Flink
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>> sql
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't introduce a
> > >>>> >>> totally new syntax.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We will not change the
> > >>>> >>> status respect to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference in
> > >>>> >>> functionality of flink sql itself on
> > >>>> >>>>>>>>>>> streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch, for example,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the proctime window agg on
> > >>>> >>> streaming and global sort on
> > >>>> >>>>>>>>> batch
> > >>>> >>>>>>>>>>>>> that
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> you
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in fact, do not work
> > >>>> >>> properly in the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other mode, so when the
> > >>>> >>> user modifies the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh mode of a dynamic
> > >>>> >>> table that is not supported, we
> > >>>> >>>>>>>>> will
> > >>>> >>>>>>>>>>>>>>>>>>>>> throw
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> an
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify the
> > >>>> >>> query of dynamic table is allowed?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
> > >>>> >>> refresh a dynamic table based on the
> > >>>> >>>>>>>>> initial
> > >>>> >>>>>>>>>>>>>>>>>>>>> query?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, in the current
> > >>>> >>> design, the query definition of the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table is not
> > >>>> >>> allowed
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be modified, and you
> > >>>> >>> can only refresh the data based
> > >>>> >>>>>>>>> on
> > >>>> >>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial definition.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use dynamic
> > >>>> >>> table?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table seems
> > >>>> >>> to be similar to the materialized
> > >>>> >>>>>>>>>>> view.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Will
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
> > >>>> >>> materialized view rewriting during the
> > >>>> >>>>>>>>>>>>>>>>>>>>> optimization?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's true that dynamic
> > >>>> >>> table and materialized view
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are similar in some ways,
> > >>>> >>> but as Ron
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explains
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are differences. In
> > >>>> >>> terms of optimization, automated
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization discovery
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to that supported
> > >>>> >>> by calcite is also a potential
> > >>>> >>>>>>>>>>>>>>>>>>>>> possibility,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perhaps with the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition of automated
> > >>>> >>> rewriting in the future.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron liu <
> > >>>> >>> [email protected]> 于2024年3月14日周四 14:01写道：
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Timo
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for later
> > >>>> >>> response, thanks for your feedback.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding your
> > >>>> >>> questions:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has introduced
> > >>>> >>> the concept of Dynamic Tables many
> > >>>> >>>>>>>>> years
> > >>>> >>>>>>>>>>>>>>>>>>>>> ago.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term "Dynamic
> > >>>> >>> Table" fit into Flink's regular
> > >>>> >>>>>>>>> tables
> > >>>> >>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> also
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it relate to
> > >>>> >>> Table API?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that adding
> > >>>> >>> the DYNAMIC TABLE keyword could cause
> > >>>> >>>>>>>>>>>>>>>>>>>>> confusion
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
> > >>>> >>> term for regular CREATE TABLE (that can
> > >>>> >>>>>>>>> be
> > >>>> >>>>>>>>>>>>>>>>>>>>> "kind
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> of
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well and
> > >>>> >>> is backed by a changelog) is then
> > >>>> >>>>>>>>>>> missing.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> Also
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we call
> > >>>> >>> our connectors for those tables,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and DynamicTableSink.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I find
> > >>>> >>> it contradicting that a TABLE can be
> > >>>> >>>>>>>>>>>>>>>>>>>>> "paused" or
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From an
> > >>>> >>> English language perspective, this
> > >>>> >>>>> does
> > >>>> >>>>>>>>>>>>> sound
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
> > >>>> >>> opinion (without much research yet), a
> > >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
> > >>>> >>> should rather be modelled as a CREATE
> > >>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
> > >>>> >>> familiar with?) or a new concept such
> > >>>> >>>>> as
> > >>>> >>>>>>>>> a
> > >>>> >>>>>>>>>>>>>>>>>>>>> CREATE
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be paused
> > >>>> >>> and resumed?).
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the current
> > >>>> >>> concept[1], it actually includes: Dynamic
> > >>>> >>>>>>>>>>> Tables
> > >>>> >>>>>>>>>>>>> &
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query.
> > >>>> >>> Dynamic Table is just an abstract
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logical concept
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , which in its physical
> > >>>> >>> form represents either a table
> > >>>> >>>>> or a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> changelog
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream. It requires the
> > >>>> >>> combination with Continuous Query
> > >>>> >>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>> achieve
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic updates of the
> > >>>> >>> target table similar to a
> > >>>> >>>>> database’s
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Materialized View.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We hope to upgrade the
> > >>>> >>> Dynamic Table to a real entity
> > >>>> >>>>> that
> > >>>> >>>>>>>>>>> users
> > >>>> >>>>>>>>>>>>>>>>>>>>> can
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operate, which combines
> > >>>> >>> the logical concepts of Dynamic
> > >>>> >>>>>>>>>>> Tables +
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. By
> > >>>> >>> integrating the definition of tables
> > >>>> >>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> queries,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can achieve
> > >>>> >>> functions similar to Materialized Views,
> > >>>> >>>>>>>>>>>>>>>>>>>>> simplifying
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users' data processing
> > >>>> >>> pipelines.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, the object of the
> > >>>> >>> suspend operation is the refresh
> > >>>> >>>>>>>>> task of
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table. The
> > >>>> >>> command `ALTER DYNAMIC TABLE
> > >>>> >>>>> table_name
> > >>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> `
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is actually a shorthand
> > >>>> >>> for `ALTER DYNAMIC TABLE
> > >>>> >>>>> table_name
> > >>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REFRESH` (if written in
> > >>>> >>> full for clarity, we can also
> > >>>> >>>>>>>>> modify
> > >>>> >>>>>>>>>>>>> it).
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Initially, we also
> > >>>> >>> considered Materialized Views
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , but ultimately
> > >>>> >>> decided against them. Materialized views
> > >>>> >>>>>>>>> are
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> designed
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to enhance query
> > >>>> >>> performance for workloads that consist
> > >>>> >>>>> of
> > >>>> >>>>>>>>>>>>>>>>>>>>> common,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repetitive query
> > >>>> >>> patterns. In essence, a materialized
> > >>>> >>>>> view
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> represents
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the result of a query.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is not
> > >>>> >>> intended to support data modification.
> > >>>> >>>>>>>>> For
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lakehouse scenarios,
> > >>>> >>> where the ability to delete or
> > >>>> >>>>> update
> > >>>> >>>>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>> is
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crucial (such as
> > >>>> >>> compliance with GDPR, FLIP-2),
> > >>>> >>>>>>>>> materialized
> > >>>> >>>>>>>>>>>>>>>>>>>>> views
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fall short.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to CREATE
> > >>>> >>> (regular) TABLE, CREATE DYNAMIC TABLE
> > >>>> >>>>>>>>> not
> > >>>> >>>>>>>>>>>>> only
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defines metadata in the
> > >>>> >>> catalog but also automatically
> > >>>> >>>>>>>>>>> initiates
> > >>>> >>>>>>>>>>>>>>>>>>>>> a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data refresh task based
> > >>>> >>> on the query specified during
> > >>>> >>>>> table
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> creation.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It dynamically executes
> > >>>> >>> data updates. Users can focus on
> > >>>> >>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies and data
> > >>>> >>> generation logic.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The new dynamic table
> > >>>> >>> does not conflict with the existing
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource and
> > >>>> >>> DynamicTableSink interfaces. For
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> developer,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all that needs to be
> > >>>> >>> implemented is the new
> > >>>> >>>>>>>>>>> CatalogDynamicTable,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without changing the
> > >>>> >>> implementation of source and sink.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. For now, the FLIP
> > >>>> >>> does not consider supporting Table
> > >>>> >>>>> API
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> operations
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . However, once the SQL
> > >>>> >>> syntax is finalized, we can
> > >>>> >>>>> discuss
> > >>>> >>>>>>>>>>> this
> > >>>> >>>>>>>>>>>>>>>>>>>>> in
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> a
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate FLIP.
> > >>>> >>> Currently, I have a rough idea: the Table
> > >>>> >>>>>>>>> API
> > >>>> >>>>>>>>>>>>>>>>>>>>> should
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also introduce
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTable operation
> > >>>> >>> interfaces
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding to the
> > >>>> >>> existing Table interfaces.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The TableEnvironment
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will provide relevant
> > >>>> >>> methods to support various
> > >>>> >>>>> dynamic
> > >>>> >>>>>>>>>>>>> table
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations. The goal
> > >>>> >>> for the new Dynamic Table is to
> > >>>> >>>>> offer
> > >>>> >>>>>>>>>>> users
> > >>>> >>>>>>>>>>>>>>>>>>>>> an
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience similar to
> > >>>> >>> using a database, which is why we
> > >>>> >>>>>>>>>>>>>>>>>>>>> prioritize
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL-based approaches
> > >>>> >>> initially.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you envision
> > >>>> >>> re-adding the functionality of a
> > >>>> >>>>>>>>>>> statement
> > >>>> >>>>>>>>>>>>>>>>>>>>> set,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to multiple
> > >>>> >>> tables? This is a very important
> > >>>> >>>>> use
> > >>>> >>>>>>>>>>> case
> > >>>> >>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Multi-tables is indeed
> > >>>> >>> a very important user scenario. In
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> future,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we can consider
> > >>>> >>> extending the statement set syntax to
> > >>>> >>>>>>>>> support
> > >>>> >>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of multiple
> > >>>> >>> dynamic tables.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
> > >>>> >>> days of Flink SQL, we were discussing
> > >>>> >>>>>>>>>>> `SELECT
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
> > >>>> >>> MINUTES`. Your proposal seems to rephrase
> > >>>> >>>>>>>>>>> STREAM
> > >>>> >>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other keywords
> > >>>> >>> DYNAMIC TABLE and FRESHNESS. But the
> > >>>> >>>>>>>>> core
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
> > >>>> >>> still there. I'm wondering if we should
> > >>>> >>>>>>>>>>> widen
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part of
> > >>>> >>> this FLIP but a new FLIP) to follow
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>> standard
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
> > >>>> >>> `SELECT * FROM t` bounded by default and
> > >>>> >>>>>>>>> use
> > >>>> >>>>>>>>>>>>> new
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
> > >>>> >>> behavior. Flink 2.0 would be the perfect
> > >>>> >>>>>>>>> time
> > >>>> >>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
> > >>>> >>> require careful discussions. What do
> > >>>> >>>>> you
> > >>>> >>>>>>>>>>>>>>>>>>>>> think?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The query part indeed
> > >>>> >>> requires a separate FLIP
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for discussion, as it
> > >>>> >>> involves changes to the default
> > >>>> >>>>>>>>>>> behavior.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>>
> >
> https://urldefense.com/v3/__https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73477_wHn$
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang <
> > >>>> >>> [email protected]> 于2024年3月13日周三 15:19写道：
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lincoln & Ron,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
> > >>>> >>> proposal.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with the
> > >>>> >>> question raised by Timo.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have some
> > >>>> >>> other questions.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define
> > >>>> >>> query of dynamic table?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
> > >>>> >>> introducing new syntax?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql,
> > >>>> >>> how to handle the difference in SQL
> > >>>> >>>>>>>>> between
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
> > >>>> >>> including window aggregate based on
> > >>>> >>>>>>>>>>>>>>>>>>>>> processing
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
> > >>>> >>> global order by?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify
> > >>>> >>> the query of dynamic table is allowed?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
> > >>>> >>> refresh a dynamic table based on
> > >>>> >>>>> initial
> > >>>> >>>>>>>>>>>>> query?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use
> > >>>> >>> dynamic table?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table
> > >>>> >>> seems to be similar with materialized
> > >>>> >>>>>>>>> view.
> > >>>> >>>>>>>>>>>>>>>>>>>>> Will
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
> > >>>> >>> materialized view rewriting during the
> > >>>> >>>>>>>>>>>>>>>>>>>>> optimization?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo Walther <
> > >>>> >>> [email protected]> 于2024年3月13日周三 01:24写
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 道：
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln & Ron,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for
> > >>>> >>> proposing this FLIP. I think a design
> > >>>> >>>>> similar
> > >>>> >>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>> what
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose has been
> > >>>> >>> in the heads of many people, however,
> > >>>> >>>>>>>>> I'm
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this will fit
> > >>>> >>> into the bigger picture.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't deeply
> > >>>> >>> reviewed the FLIP yet, but would like
> > >>>> >>>>> to
> > >>>> >>>>>>>>>>> ask
> > >>>> >>>>>>>>>>>>>>>>>>>>> some
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial questions:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has
> > >>>> >>> introduced the concept of Dynamic Tables many
> > >>>> >>>>>>>>>>> years
> > >>>> >>>>>>>>>>>>>>>>>>>>> ago.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term
> > >>>> >>> "Dynamic Table" fit into Flink's regular
> > >>>> >>>>>>>>>>> tables
> > >>>> >>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it
> > >>>> >>> relate to Table API?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that
> > >>>> >>> adding the DYNAMIC TABLE keyword could
> > >>>> >>>>> cause
> > >>>> >>>>>>>>>>>>>>>>>>>>> confusion
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
> > >>>> >>> term for regular CREATE TABLE (that
> > >>>> >>>>> can
> > >>>> >>>>>>>>> be
> > >>>> >>>>>>>>>>>>>>>>>>>>> "kind
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well
> > >>>> >>> and is backed by a changelog) is then
> > >>>> >>>>>>>>>>>>> missing.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we
> > >>>> >>> call our connectors for those tables,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>> DynamicTableSink.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I
> > >>>> >>> find it contradicting that a TABLE can be
> > >>>> >>>>>>>>>>>>>>>>>>>>> "paused"
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> or
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From
> > >>>> >>> an English language perspective, this
> > >>>> >>>>>>>>> does
> > >>>> >>>>>>>>>>>>>>>>>>>>> sound
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
> > >>>> >>> opinion (without much research yet), a
> > >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
> > >>>> >>> should rather be modelled as a CREATE
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
> > >>>> >>> familiar with?) or a new concept such
> > >>>> >>>>>>>>> as a
> > >>>> >>>>>>>>>>>>>>>>>>>>> CREATE
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be
> > >>>> >>> paused and resumed?).
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you
> > >>>> >>> envision re-adding the functionality of a
> > >>>> >>>>>>>>>>> statement
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> set,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to
> > >>>> >>> multiple tables? This is a very important
> > >>>> >>>>> use
> > >>>> >>>>>>>>>>> case
> > >>>> >>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
> > >>>> >>> days of Flink SQL, we were discussing
> > >>>> >>>>>>>>>>> `SELECT
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
> > >>>> >>> MINUTES`. Your proposal seems to rephrase
> > >>>> >>>>>>>>>>> STREAM
> > >>>> >>>>>>>>>>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other
> > >>>> >>> keywords DYNAMIC TABLE and FRESHNESS. But
> > >>>> >>>>> the
> > >>>> >>>>>>>>>>> core
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
> > >>>> >>> still there. I'm wondering if we
> > >>>> >>>>> should
> > >>>> >>>>>>>>>>> widen
> > >>>> >>>>>>>>>>>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part
> > >>>> >>> of this FLIP but a new FLIP) to follow
> > >>>> >>>>>>>>> the
> > >>>> >>>>>>>>>>>>>>>>>>>>>>> standard
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
> > >>>> >>> `SELECT * FROM t` bounded by default
> > >>>> >>>>> and
> > >>>> >>>>>>>>> use
> > >>>> >>>>>>>>>>>>>>>>>>>>> new
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
> > >>>> >>> behavior. Flink 2.0 would be the
> > >>>> >>>>> perfect
> > >>>> >>>>>>>>>>> time
> > >>>> >>>>>>>>>>>>>>>>>>>>> for
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
> > >>>> >>> require careful discussions. What do
> > >>>> >>>>>>>>> you
> > >>>> >>>>>>>>>>>>>>>>>>>>> think?
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 11.03.24
> > >>>> >>> 08:23, Ron liu wrote:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Dev
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> > >>>> >>> and I would like to start a discussion
> > >>>> >>>>> about
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-435:
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Introduce a
> > >>>> >>> New Dynamic Table for Simplifying Data
> > >>>> >>>>>>>>>>>>> Pipelines.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This FLIP is
> > >>>> >>> designed to simplify the development of
> > >>>> >>>>>>>>> data
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processing
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> > >>>> >>> With Dynamic Tables with uniform SQL
> > >>>> >>>>>>>>> statements
> > >>>> >>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness,
> > >>>> >>> users can define batch and streaming
> > >>>> >>>>>>>>>>>>>>>>>>>>> transformations
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in the
> > >>>> >>> same way, accelerate ETL pipeline
> > >>>> >>>>>>>>> development,
> > >>>> >>>>>>>>>>>>> and
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> task
> > >>>> >>> scheduling automatically.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For more
> > >>>> >>> details, see FLIP-435 [1]. Looking forward to
> > >>>> >>>>>>>>> your
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback.
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln & Ron
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>
> > >>>> >>>
> > >>>> >>
> > >>>> >
> > >>>>
> > >>>>
> >
>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Reply via email to