Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Lincoln Lee Tue, 09 Apr 2024 05:46:14 -0700

Thanks Ron and Timo for your proposal!

Here is my ranking:


1. Derived table -> extend the persistent semantics of derived table in SQL
   standard, with a strong association with query, and has industry
precedents
   such as Google Looker.

2. Live Table ->  an alternative for 'dynamic table'

3. Materialized Table -> combination of the Materialized View and Table,
but
    still a table which accept data changes

4. Materialized View -> need to extend understanding of the view to accept
    data changes

The reason for not adding 'Refresh Table' is I don't want to tell the user
to 'refresh a refresh table'.


Best,
Lincoln Lee


Ron liu <ron9....@gmail.com> 于2024年4月9日周二 20:11写道：

> Hi, Dev
>
> My rankings are:
>
> 1. Derived Table
> 2. Materialized Table
> 3. Live Table
> 4. Materialized View
>
> Best,
> Ron
>
>
>
> Ron liu <ron9....@gmail.com> 于2024年4月9日周二 20:07写道：
>
> > Hi, Dev
> >
> > After several rounds of discussion, there is currently no consensus on
> the
> > name of the new concept. Timo has proposed that we decide the name
> through
> > a vote. This is a good solution when there is no clear preference, so we
> > will adopt this approach.
> >
> > Regarding the name of the new concept, there are currently five
> candidates:
> > 1. Derived Table -> taken by SQL standard
> > 2. Materialized Table -> similar to SQL materialized view but a table
> > 3. Live Table -> similar to dynamic tables
> > 4. Refresh Table -> states what it does
> > 5. Materialized View -> needs to extend the standard to support modifying
> > data
> >
> > For the above five candidates, everyone can give your rankings based on
> > your preferences. You can choose up to five options or only choose some
> of
> > them.
> > We will use a scoring rule, where the* first rank gets 5 points, second
> > rank gets 4 points, third rank gets 3 points, fourth rank gets 2 points,
> > and fifth rank gets 1 point*.
> > After the voting closes, I will score all the candidates based on
> > everyone's votes, and the candidate with the highest score will be chosen
> > as the name for the new concept.
> >
> > The voting will last up to 72 hours and is expected to close this Friday.
> > I look forward to everyone voting on the name in this thread. Of course,
> we
> > also welcome new input regarding the name.
> >
> > Best,
> > Ron
> >
> > Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:49写道：
> >
> >> Hi, Dev
> >>
> >> Sorry for my previous statement was not quite accurate. We will hold a
> >> vote for the name within this thread.
> >>
> >> Best,
> >> Ron
> >>
> >>
> >> Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:29写道：
> >>
> >>> Hi, Timo
> >>>
> >>> Thanks for your reply.
> >>>
> >>> I agree with you that sometimes naming is more difficult. When no one
> >>> has a clear preference, voting on the name is a good solution, so I'll
> send
> >>> a separate email for the vote, clarify the rules for the vote, then let
> >>> everyone vote.
> >>>
> >>> One other point to confirm, in your ranking there is an option for
> >>> Materialized View, does it stand for the UPDATING Materialized View
> that
> >>> you mentioned earlier in the discussion? If using Materialized View I
> think
> >>> it is needed to extend it.
> >>>
> >>> Best,
> >>> Ron
> >>>
> >>> Timo Walther <twal...@apache.org> 于2024年4月9日周二 17:20写道：
> >>>
> >>>> Hi Ron,
> >>>>
> >>>> yes naming is hard. But it will have large impact on trainings,
> >>>> presentations, and the mental model of users. Maybe the easiest is to
> >>>> collect ranking by everyone with some short justification:
> >>>>
> >>>>
> >>>> My ranking (from good to not so good):
> >>>>
> >>>> 1. Refresh Table -> states what it does
> >>>> 2. Materialized Table -> similar to SQL materialized view but a table
> >>>> 3. Live Table -> nice buzzword, but maybe still too close to dynamic
> >>>> tables?
> >>>> 4. Materialized View -> a bit broader than standard but still very
> >>>> similar
> >>>> 5. Derived table -> taken by standard
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>>
> >>>> On 07.04.24 11:34, Ron liu wrote:
> >>>> > Hi, Dev
> >>>> >
> >>>> > This is a summary letter. After several rounds of discussion, there
> >>>> is a
> >>>> > strong consensus about the FLIP proposal and the issues it aims to
> >>>> address.
> >>>> > The current point of disagreement is the naming of the new concept.
> I
> >>>> have
> >>>> > summarized the candidates as follows:
> >>>> >
> >>>> > 1. Derived Table (Inspired by Google Lookers)
> >>>> >      - Pros: Google Lookers has introduced this concept, which is
> >>>> designed
> >>>> > for building Looker's automated modeling, aligning with our purpose
> >>>> for the
> >>>> > stream-batch automatic pipeline.
> >>>> >
> >>>> >      - Cons: The SQL standard uses derived table term extensively,
> >>>> vendors
> >>>> > adopt this for simply referring to a table within a subclause.
> >>>> >
> >>>> > 2. Materialized Table: It means materialize the query result to
> table,
> >>>> > similar to Db2 MQT (Materialized Query Tables). In addition,
> Snowflake
> >>>> > Dynamic Table's predecessor is also called Materialized Table.
> >>>> >
> >>>> > 3. Updating Table (From Timo)
> >>>> >
> >>>> > 4. Updating Materialized View (From Timo)
> >>>> >
> >>>> > 5. Refresh/Live Table (From Martijn)
> >>>> >
> >>>> > As Martijn said, naming is a headache, looking forward to more
> >>>> valuable
> >>>> > input from everyone.
> >>>> >
> >>>> > [1]
> >>>> >
> >>>>
> https://cloud.google.com/looker/docs/derived-tables#persistent_derived_tables
> >>>> > [2]
> >>>> https://www.ibm.com/docs/en/db2/11.5?topic=tables-materialized-query
> >>>> > [3]
> >>>> >
> >>>>
> https://community.denodo.com/docs/html/browse/6.0/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables
> >>>> >
> >>>> > Best,
> >>>> > Ron
> >>>> >
> >>>> > Ron liu <ron9....@gmail.com> 于2024年4月7日周日 15:55写道：
> >>>> >
> >>>> >> Hi, Lorenzo
> >>>> >>
> >>>> >> Thank you for your insightful input.
> >>>> >>
> >>>> >>>>> I think the 2 above twisted the materialized view concept to
> more
> >>>> than
> >>>> >> just an optimization for accessing pre-computed aggregates/filters.
> >>>> >> I think that concept (at least in my mind) is now adherent to the
> >>>> >> semantics of the words themselves ("materialized" and "view") than
> >>>> on its
> >>>> >> implementations in DBMs, as just a view on raw data that,
> hopefully,
> >>>> is
> >>>> >> constantly updated with fresh results.
> >>>> >> That's why I understand Timo's et al. objections.
> >>>> >>
> >>>> >> Your understanding of Materialized Views is correct. However, in
> our
> >>>> >> scenario, an important feature is the support for Update & Delete
> >>>> >> operations, which the current Materialized Views cannot fulfill. As
> >>>> we
> >>>> >> discussed with Timo before, if Materialized Views needs to support
> >>>> data
> >>>> >> modifications, it would require an extension of new keywords, such
> as
> >>>> >> CREATING xxx (UPDATING) MATERIALIZED VIEW.
> >>>> >>
> >>>> >>>>> Still, I don't understand why we need another type of special
> >>>> table.
> >>>> >> Could you dive deep into the reasons why not simply adding the
> >>>> FRESHNESS
> >>>> >> parameter to standard tables?
> >>>> >>
> >>>> >> Firstly, I need to emphasize that we cannot achieve the design goal
> >>>> of
> >>>> >> FLIP through the CREATE TABLE syntax combined with a FRESHNESS
> >>>> parameter.
> >>>> >> The proposal of this FLIP is to use Dynamic Table + Continuous
> >>>> Query, and
> >>>> >> combine it with FRESHNESS to realize a streaming-batch unification.
> >>>> >> However, CREATE TABLE is merely a metadata operation and cannot
> >>>> >> automatically start a background refresh job. To achieve the design
> >>>> goal of
> >>>> >> FLIP with standard tables, it would require extending the CTAS[1]
> >>>> syntax to
> >>>> >> introduce the FRESHNESS keyword. We considered this design
> >>>> initially, but
> >>>> >> it has following problems:
> >>>> >>
> >>>> >> 1. Distinguishing a table created through CTAS as a standard table
> >>>> or as a
> >>>> >> "special" standard table with an ongoing background refresh job
> >>>> using the
> >>>> >> FRESHNESS keyword is very obscure for users.
> >>>> >> 2. It intrudes on the semantics of the CTAS syntax. Currently,
> tables
> >>>> >> created using CTAS only add table metadata to the Catalog and do
> not
> >>>> record
> >>>> >> attributes such as query. There are also no ongoing background
> >>>> refresh
> >>>> >> jobs, and the data writing operation happens only once at table
> >>>> creation.
> >>>> >> 3. For the framework, when we perform a certain kind of Alter Table
> >>>> >> behavior for a table, for the table created by specifying FRESHNESS
> >>>> and did
> >>>> >> not specify the FRESHNESS created table behavior how to distinguish
> >>>> , which
> >>>> >> will also cause confusion.
> >>>> >>
> >>>> >> In terms of the design goal of combining Dynamic Table + Continuous
> >>>> Query,
> >>>> >> the FLIP proposal cannot be realized by only extending the current
> >>>> stardand
> >>>> >> tables, so a new kind of dynamic table needs to be introduced at
> the
> >>>> >> first-level concept.
> >>>> >>
> >>>> >> [1]
> >>>> >>
> >>>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement
> >>>> >>
> >>>> >> Best,
> >>>> >> Ron
> >>>> >>
> >>>> >> <lorenzo.affe...@ververica.com.invalid> 于2024年4月3日周三 22:25写道：
> >>>> >>
> >>>> >>> Hello everybody!
> >>>> >>> Thanks for the FLIP as it looks amazing (and I think the prove is
> >>>> this
> >>>> >>> deep discussion it is provoking :))
> >>>> >>>
> >>>> >>> I have a couple of comments to add to this:
> >>>> >>>
> >>>> >>> Even though I get the reason why you rejected MATERIALIZED VIEW, I
> >>>> still
> >>>> >>> like it a lot, and I would like to provide pointers on how the
> >>>> materialized
> >>>> >>> view concept twisted in last years:
> >>>> >>>
> >>>> >>> • Materialize DB (https://materialize.com/)
> >>>> >>> • The famous talk by Martin Kleppmann "turning the database inside
> >>>> out" (
> >>>> >>> https://www.youtube.com/watch?v=fU9hR3kiOK0)
> >>>> >>>
> >>>> >>> I think the 2 above twisted the materialized view concept to more
> >>>> than
> >>>> >>> just an optimization for accessing pre-computed
> aggregates/filters.
> >>>> >>> I think that concept (at least in my mind) is now adherent to the
> >>>> >>> semantics of the words themselves ("materialized" and "view") than
> >>>> on its
> >>>> >>> implementations in DBMs, as just a view on raw data that,
> >>>> hopefully, is
> >>>> >>> constantly updated with fresh results.
> >>>> >>> That's why I understand Timo's et al. objections.
> >>>> >>> Still I understand there is no need to add confusion :)
> >>>> >>>
> >>>> >>> Still, I don't understand why we need another type of special
> table.
> >>>> >>> Could you dive deep into the reasons why not simply adding the
> >>>> FRESHNESS
> >>>> >>> parameter to standard tables?
> >>>> >>>
> >>>> >>> I would say that as a very seamless implementation with the goal
> of
> >>>> a
> >>>> >>> unification of batch and streaming.
> >>>> >>> If we stick to a unified world, I think that Flink should just
> >>>> provide 1
> >>>> >>> type of table that is inherently dynamic.
> >>>> >>> Now, depending on FRESHNESS objectives / connectors used in WITH,
> >>>> that
> >>>> >>> table can be backed by a stream or batch job as you explained in
> >>>> your FLIP.
> >>>> >>>
> >>>> >>> Maybe I am totally missing the point :)
> >>>> >>>
> >>>> >>> Thank you in advance,
> >>>> >>> Lorenzo
> >>>> >>> On Apr 3, 2024 at 15:25 +0200, Martijn Visser <
> >>>> martijnvis...@apache.org>,
> >>>> >>> wrote:
> >>>> >>>> Hi all,
> >>>> >>>>
> >>>> >>>> Thanks for the proposal. While the FLIP talks extensively on how
> >>>> >>> Snowflake
> >>>> >>>> has Dynamic Tables and Databricks has Delta Live Tables, my
> >>>> >>> understanding
> >>>> >>>> is that Databricks has CREATE STREAMING TABLE [1] which relates
> >>>> with
> >>>> >>> this
> >>>> >>>> proposal.
> >>>> >>>>
> >>>> >>>> I do have concerns about using CREATE DYNAMIC TABLE, specifically
> >>>> about
> >>>> >>>> confusing the users who are familiar with Snowflake's approach
> >>>> where you
> >>>> >>>> can't change the content via DML statements, while that is
> >>>> something
> >>>> >>> that
> >>>> >>>> would work in this proposal. Naming is hard of course, but I
> would
> >>>> >>> probably
> >>>> >>>> prefer something like CREATE CONTINUOUS TABLE, CREATE REFRESH
> >>>> TABLE or
> >>>> >>>> CREATE LIVE TABLE.
> >>>> >>>>
> >>>> >>>> Best regards,
> >>>> >>>>
> >>>> >>>> Martijn
> >>>> >>>>
> >>>> >>>> [1]
> >>>> >>>>
> >>>> >>>
> >>>>
> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html
> >>>> >>>>
> >>>> >>>> On Wed, Apr 3, 2024 at 5:19 AM Ron liu <ron9....@gmail.com>
> wrote:
> >>>> >>>>
> >>>> >>>>> Hi, dev
> >>>> >>>>>
> >>>> >>>>> After offline discussion with Becket Qin, Lincoln Lee and Jark
> >>>> Wu, we
> >>>> >>> have
> >>>> >>>>> improved some parts of the FLIP.
> >>>> >>>>>
> >>>> >>>>> 1. Add Full Refresh Mode section to clarify the semantics of
> full
> >>>> >>> refresh
> >>>> >>>>> mode.
> >>>> >>>>> 2. Add Future Improvement section explaining why query statement
> >>>> does
> >>>> >>> not
> >>>> >>>>> support references to temporary view and possible solutions.
> >>>> >>>>> 3. The Future Improvement section explains a possible future
> >>>> solution
> >>>> >>> for
> >>>> >>>>> dynamic table to support the modification of query statements to
> >>>> meet
> >>>> >>> the
> >>>> >>>>> common field-level schema evolution requirements of the
> lakehouse.
> >>>> >>>>> 4. The Refresh section emphasizes that the Refresh command and
> the
> >>>> >>>>> background refresh job can be executed in parallel, with no
> >>>> >>> restrictions at
> >>>> >>>>> the framework level.
> >>>> >>>>> 5. Convert RefreshHandler into a plug-in interface to support
> >>>> various
> >>>> >>>>> workflow schedulers.
> >>>> >>>>>
> >>>> >>>>> Best,
> >>>> >>>>> Ron
> >>>> >>>>>
> >>>> >>>>> Ron liu <ron9....@gmail.com> 于2024年4月2日周二 10:28写道：
> >>>> >>>>>
> >>>> >>>>>>> Hi, Venkata krishnan
> >>>> >>>>>>>
> >>>> >>>>>>> Thank you for your involvement and suggestions, and hope that
> >>>> the
> >>>> >>> design
> >>>> >>>>>>> goals of this FLIP will be helpful to your business.
> >>>> >>>>>>>
> >>>> >>>>>>>>>>>>> 1. In the proposed FLIP, given the example for the
> >>>> >>> dynamic table, do
> >>>> >>>>>>> the
> >>>> >>>>>>> data sources always come from a single lake storage such as
> >>>> >>> Paimon or
> >>>> >>>>> does
> >>>> >>>>>>> the same proposal solve for 2 disparate storage systems like
> >>>> >>> Kafka and
> >>>> >>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
> >>>> Paimon?
> >>>> >>>>>>> Basically the lambda architecture that is mentioned in the
> FLIP
> >>>> >>> as well.
> >>>> >>>>>>> I'm wondering if it is possible to switch b/w sources based on
> >>>> the
> >>>> >>>>>>> execution mode, for eg: if it is backfill operation, switch
> to a
> >>>> >>> data
> >>>> >>>>> lake
> >>>> >>>>>>> storage system like Iceberg, otherwise an event streaming
> system
> >>>> >>> like
> >>>> >>>>>>> Kafka.
> >>>> >>>>>>>
> >>>> >>>>>>> Dynamic table is a design abstraction at the framework level
> and
> >>>> >>> is not
> >>>> >>>>>>> tied to the physical implementation of the connector. If a
> >>>> >>> connector
> >>>> >>>>>>> supports a combination of Kafka and lake storage, this works
> >>>> fine.
> >>>> >>>>>>>
> >>>> >>>>>>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
> >>>> >>> nearline
> >>>> >>>>> update
> >>>> >>>>>>> (streaming) case that are stateful applications? What I mean
> by
> >>>> >>> that is,
> >>>> >>>>>>> will the state from the batch application be transferred to
> the
> >>>> >>> nearline
> >>>> >>>>>>> application after the bootstrap execution is complete?
> >>>> >>>>>>>
> >>>> >>>>>>> I think this is another orthogonal thing, something that
> >>>> FLIP-327
> >>>> >>> tries
> >>>> >>>>> to
> >>>> >>>>>>> address, not directly related to Dynamic Table.
> >>>> >>>>>>>
> >>>> >>>>>>> [1]
> >>>> >>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data
> >>>> >>>>>>>
> >>>> >>>>>>> Best,
> >>>> >>>>>>> Ron
> >>>> >>>>>>>
> >>>> >>>>>>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年3月30日周六
> >>>> >>> 07:06写道：
> >>>> >>>>>>>
> >>>> >>>>>>>>> Ron and Lincoln,
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Great proposal and interesting discussion for adding support
> >>>> >>> for dynamic
> >>>> >>>>>>>>> tables within Flink.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> At LinkedIn, we are also trying to solve compute/storage
> >>>> >>> convergence for
> >>>> >>>>>>>>> similar problems discussed as part of this FLIP,
> specifically
> >>>> >>> periodic
> >>>> >>>>>>>>> backfill, bootstrap + nearline update use cases using single
> >>>> >>>>>>>>> implementation
> >>>> >>>>>>>>> of business logic (single script).
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Few clarifying questions:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> 1. In the proposed FLIP, given the example for the dynamic
> >>>> >>> table, do the
> >>>> >>>>>>>>> data sources always come from a single lake storage such as
> >>>> >>> Paimon or
> >>>> >>>>> does
> >>>> >>>>>>>>> the same proposal solve for 2 disparate storage systems like
> >>>> >>> Kafka and
> >>>> >>>>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
> >>>> >>> Paimon?
> >>>> >>>>>>>>> Basically the lambda architecture that is mentioned in the
> >>>> >>> FLIP as well.
> >>>> >>>>>>>>> I'm wondering if it is possible to switch b/w sources based
> on
> >>>> >>> the
> >>>> >>>>>>>>> execution mode, for eg: if it is backfill operation, switch
> to
> >>>> >>> a data
> >>>> >>>>> lake
> >>>> >>>>>>>>> storage system like Iceberg, otherwise an event streaming
> >>>> >>> system like
> >>>> >>>>>>>>> Kafka.
> >>>> >>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
> >>>> >>> nearline update
> >>>> >>>>>>>>> (streaming) case that are stateful applications? What I mean
> >>>> >>> by that is,
> >>>> >>>>>>>>> will the state from the batch application be transferred to
> >>>> >>> the nearline
> >>>> >>>>>>>>> application after the bootstrap execution is complete?
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Regards
> >>>> >>>>>>>>> Venkata krishnan
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Mon, Mar 25, 2024 at 8:03 PM Ron liu <ron9....@gmail.com
> >
> >>>> >>> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>>> Hi, Timo
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Thanks for your quick response, and your suggestion.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Yes, this discussion has turned into confirming whether
> >>>> >>> it's a special
> >>>> >>>>>>>>>>> table or a special MV.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> 1. The key problem with MVs is that they don't support
> >>>> >>> modification,
> >>>> >>>>> so
> >>>> >>>>>>>>> I
> >>>> >>>>>>>>>>> prefer it to be a special table. Although the periodic
> >>>> >>> refresh
> >>>> >>>>> behavior
> >>>> >>>>>>>>> is
> >>>> >>>>>>>>>>> more characteristic of an MV, since we are already a
> >>>> >>> special table,
> >>>> >>>>>>>>>>> supporting periodic refresh behavior is quite natural,
> >>>> >>> similar to
> >>>> >>>>>>>>> Snowflake
> >>>> >>>>>>>>>>> dynamic tables.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> 2. Regarding the keyword UPDATING, since the current
> >>>> >>> Regular Table is
> >>>> >>>>> a
> >>>> >>>>>>>>>>> Dynamic Table, which implies support for updating through
> >>>> >>> Continuous
> >>>> >>>>>>>>> Query,
> >>>> >>>>>>>>>>> I think it is redundant to add the keyword UPDATING. In
> >>>> >>> addition,
> >>>> >>>>>>>>> UPDATING
> >>>> >>>>>>>>>>> can not reflect the Continuous Query part, can not express
> >>>> >>> the purpose
> >>>> >>>>>>>>> we
> >>>> >>>>>>>>>>> want to simplify the data pipeline through Dynamic Table +
> >>>> >>> Continuous
> >>>> >>>>>>>>>>> Query.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> 3. From the perspective of the SQL standard definition, I
> >>>> >>> can
> >>>> >>>>> understand
> >>>> >>>>>>>>>>> your concerns about Derived Table, but is it possible to
> >>>> >>> make a slight
> >>>> >>>>>>>>>>> adjustment to meet our needs? Additionally, as Lincoln
> >>>> >>> mentioned, the
> >>>> >>>>>>>>>>> Google Looker platform has introduced Persistent Derived
> >>>> >>> Table, and
> >>>> >>>>>>>>> there
> >>>> >>>>>>>>>>> are precedents in the industry; could Derived Table be a
> >>>> >>> candidate?
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Of course, look forward to your better suggestions.
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Best,
> >>>> >>>>>>>>>>> Ron
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2024年3月25日周一 18:49写道：
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>>> After thinking about this more, this discussion boils
> >>>> >>> down to
> >>>> >>>>> whether
> >>>> >>>>>>>>>>>>> this is a special table or a special materialized
> >>>> >>> view. In both
> >>>> >>>>> cases,
> >>>> >>>>>>>>>>>>> we would need to add a special keyword:
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Either
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> CREATE UPDATING TABLE
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> or
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> CREATE UPDATING MATERIALIZED VIEW
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> I still feel that the periodic refreshing behavior is
> >>>> >>> closer to a
> >>>> >>>>> MV.
> >>>> >>>>>>>>> If
> >>>> >>>>>>>>>>>>> we add a special keyword to MV, the optimizer would
> >>>> >>> know that the
> >>>> >>>>> data
> >>>> >>>>>>>>>>>>> cannot be used for query optimizations.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> I will ask more people for their opinion.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Regards,
> >>>> >>>>>>>>>>>>> Timo
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> On 25.03.24 10:45, Timo Walther wrote:
> >>>> >>>>>>>>>>>>>>> Hi Ron and Lincoln,
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> thanks for the quick response and the very
> >>>> >>> insightful discussion.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> we might limit future opportunities to
> >>>> >>> optimize queries
> >>>> >>>>>>>>>>>>>>>>> through automatic materialization rewriting by
> >>>> >>> allowing data
> >>>> >>>>>>>>>>>>>>>>> modifications, thus losing the potential for
> >>>> >>> such
> >>>> >>>>> optimizations.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> This argument makes a lot of sense to me. Due to
> >>>> >>> the updates, the
> >>>> >>>>>>>>>>> system
> >>>> >>>>>>>>>>>>>>> is not in full control of the persisted data.
> >>>> >>> However, the system
> >>>> >>>>> is
> >>>> >>>>>>>>>>>>>>> still in full control of the job that powers the
> >>>> >>> refresh. So if
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>> system manages all updating pipelines, it could
> >>>> >>> still leverage
> >>>> >>>>>>>>>>> automatic
> >>>> >>>>>>>>>>>>>>> materialization rewriting but without leveraging
> >>>> >>> the data at rest
> >>>> >>>>>>>>> (only
> >>>> >>>>>>>>>>>>>>> the data in flight).
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> we are considering another candidate, Derived
> >>>> >>> Table, the term
> >>>> >>>>>>>>>>> 'derive'
> >>>> >>>>>>>>>>>>>>>>> suggests a query, and 'table' retains
> >>>> >>> modifiability. This
> >>>> >>>>>>>>> approach
> >>>> >>>>>>>>>>>>>>>>> would not disrupt our current concept of a
> >>>> >>> dynamic table
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> I did some research on this term. The SQL standard
> >>>> >>> uses the term
> >>>> >>>>>>>>>>>>>>> "derived table" extensively (defined in section
> >>>> >>> 4.17.3). Thus, a
> >>>> >>>>>>>>> lot of
> >>>> >>>>>>>>>>>>>>> vendors adopt this for simply referring to a table
> >>>> >>> within a
> >>>> >>>>>>>>> subclause:
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/derived-tables.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghdiMp$
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32300.1600/doc/html/san1390612291252.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737h1gRux$
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://www.c-sharpcorner.com/article/derived-tables-vs-common-table-expressions/__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739bWIEcL$
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://stackoverflow.com/questions/26529804/what-are-the-derived-tables-in-my-explain-statement__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739HnGtQf$
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://www.sqlservercentral.com/articles/sql-derived-tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737DeBiqg$
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Esp. the latter example is interesting, SQL Server
> >>>> >>> allows things
> >>>> >>>>>>>>> like
> >>>> >>>>>>>>>>>>>>> this on derived tables:
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> UPDATE T SET Name='Timo' FROM (SELECT * FROM
> >>>> >>> Product) AS T
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> SELECT * FROM Product;
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Btw also Snowflake's dynamic table state:
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Because the content of a dynamic table is
> >>>> >>> fully determined
> >>>> >>>>>>>>>>>>>>>>> by the given query, the content cannot be
> >>>> >>> changed by using DML.
> >>>> >>>>>>>>>>>>>>>>> You don’t insert, update, or delete the rows
> >>>> >>> in a dynamic
> >>>> >>>>> table.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> So a new term makes a lot of sense.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> How about using `UPDATING`?
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> CREATE UPDATING TABLE
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> This reflects that modifications can be made and
> >>>> >>> from an
> >>>> >>>>>>>>>>>>>>> English-language perspective you can PAUSE or
> >>>> >>> RESUME the UPDATING.
> >>>> >>>>>>>>>>>>>>> Thus, a user can define UPDATING interval and mode?
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Looking forward to your thoughts.
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> Regards,
> >>>> >>>>>>>>>>>>>>> Timo
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>> On 25.03.24 07:09, Ron liu wrote:
> >>>> >>>>>>>>>>>>>>>>> Hi, Ahmed
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Thanks for your feedback.
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Regarding your question:
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> I want to iterate on Timo's comments
> >>>> >>> regarding the confusion
> >>>> >>>>>>>>> between
> >>>> >>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink "Table".
> >>>> >>> Should the refactoring
> >>>> >>>>>>>>> of
> >>>> >>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it in
> >>>> >>> this Flip ( as the
> >>>> >>>>>>>>>>>>>>>>> suggestions
> >>>> >>>>>>>>>>>>>>>>> in the thread ) and address the holistic
> >>>> >>> changes in a separate
> >>>> >>>>> Flip
> >>>> >>>>>>>>>>>>>>>>> for 2.0?
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Lincoln proposed a new concept in reply to
> >>>> >>> Timo: Derived Table,
> >>>> >>>>>>>>> which
> >>>> >>>>>>>>>>>>>>>>> is a
> >>>> >>>>>>>>>>>>>>>>> combination of Dynamic Table + Continuous
> >>>> >>> Query, and the use of
> >>>> >>>>>>>>>>> Derived
> >>>> >>>>>>>>>>>>>>>>> Table will not conflict with existing concepts,
> >>>> >>> what do you
> >>>> >>>>> think?
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> I feel confused with how it is further with
> >>>> >>> other components,
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>>>> examples provided feel like a standalone ETL
> >>>> >>> job, could you
> >>>> >>>>>>>>> provide in
> >>>> >>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>> FLIP an example where the table is further used
> >>>> >>> in subsequent
> >>>> >>>>>>>>> queries
> >>>> >>>>>>>>>>>>>>>>> (specially in batch mode).
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Thanks for your suggestion, I added how to use
> >>>> >>> Dynamic Table in
> >>>> >>>>>>>>> FLIP
> >>>> >>>>>>>>>>>>> user
> >>>> >>>>>>>>>>>>>>>>> story section, Dynamic Table can be referenced
> >>>> >>> by downstream
> >>>> >>>>>>>>> Dynamic
> >>>> >>>>>>>>>>>>>>>>> Table
> >>>> >>>>>>>>>>>>>>>>> and can also support OLAP queries.
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>> Ron
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>> Ron liu <ron9....@gmail.com> 于2024年3月23日周六
> >>>> >>> 10:35写道：
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Hi, Feng
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Thanks for your feedback.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> Although currently we restrict users from
> >>>> >>> modifying the query,
> >>>> >>>>> I
> >>>> >>>>>>>>>>>>> wonder
> >>>> >>>>>>>>>>>>>>>>>>> if
> >>>> >>>>>>>>>>>>>>>>>>> we can provide a better way to help users
> >>>> >>> rebuild it without
> >>>> >>>>>>>>>>> affecting
> >>>> >>>>>>>>>>>>>>>>>>> downstream OLAP queries.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Considering the problem of data consistency,
> >>>> >>> so in the first
> >>>> >>>>> step
> >>>> >>>>>>>>> we
> >>>> >>>>>>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>> strictly limited in semantics and do not
> >>>> >>> support modify the
> >>>> >>>>> query.
> >>>> >>>>>>>>>>>>>>>>>>> This is
> >>>> >>>>>>>>>>>>>>>>>>> really a good problem, one of my ideas is to
> >>>> >>> introduce a syntax
> >>>> >>>>>>>>>>>>>>>>>>> similar to
> >>>> >>>>>>>>>>>>>>>>>>> SWAP [1], which supports exchanging two
> >>>> >>> Dynamic Tables.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>  From the documentation, the definitions
> >>>> >>> SQL and job
> >>>> >>>>> information
> >>>> >>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this mean that
> >>>> >>> if a system needs to
> >>>> >>>>>>>>> adapt
> >>>> >>>>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to store
> >>>> >>> Flink's job information
> >>>> >>>>> in
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>> corresponding system?
> >>>> >>>>>>>>>>>>>>>>>>> For example, does MySQL's Catalog need to
> >>>> >>> store flink job
> >>>> >>>>>>>>> information
> >>>> >>>>>>>>>>>>> as
> >>>> >>>>>>>>>>>>>>>>>>> well?
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Yes, currently we need to rely on Catalog to
> >>>> >>> store refresh job
> >>>> >>>>>>>>>>>>>>>>>>> information.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> Users still need to consider how much
> >>>> >>> memory is being used, how
> >>>> >>>>>>>>>>> large
> >>>> >>>>>>>>>>>>>>>>>>> the concurrency is, which type of state
> >>>> >>> backend is being used,
> >>>> >>>>> and
> >>>> >>>>>>>>>>>>>>>>>>> may need
> >>>> >>>>>>>>>>>>>>>>>>> to set TTL expiration.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Similar to the current practice, job
> >>>> >>> parameters can be set via
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>> Flink
> >>>> >>>>>>>>>>>>>>>>>>> conf or SET commands
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> When we submit a refresh command, can we
> >>>> >>> help users detect if
> >>>> >>>>>>>>> there
> >>>> >>>>>>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>> any
> >>>> >>>>>>>>>>>>>>>>>>> running jobs and automatically stop them
> >>>> >>> before executing the
> >>>> >>>>>>>>> refresh
> >>>> >>>>>>>>>>>>>>>>>>> command? Then wait for it to complete before
> >>>> >>> restarting the
> >>>> >>>>>>>>>>> background
> >>>> >>>>>>>>>>>>>>>>>>> streaming job?
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Purely from a technical implementation point
> >>>> >>> of view, your
> >>>> >>>>>>>>> proposal
> >>>> >>>>>>>>>>> is
> >>>> >>>>>>>>>>>>>>>>>>> doable, but it would be more costly. Also I
> >>>> >>> think data
> >>>> >>>>> consistency
> >>>> >>>>>>>>>>>>>>>>>>> itself
> >>>> >>>>>>>>>>>>>>>>>>> is the responsibility of the user, similar
> >>>> >>> to how Regular Table
> >>>> >>>>> is
> >>>> >>>>>>>>>>>>>>>>>>> now also
> >>>> >>>>>>>>>>>>>>>>>>> the responsibility of the user, so it's
> >>>> >>> consistent with its
> >>>> >>>>>>>>> behavior
> >>>> >>>>>>>>>>>>>>>>>>> and no
> >>>> >>>>>>>>>>>>>>>>>>> additional guarantees are made at the engine
> >>>> >>> level.
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>> Ron
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>> Ahmed Hamdy <hamdy10...@gmail.com>
> >>>> >>> 于2024年3月22日周五 23:50写道：
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> Hi Ron,
> >>>> >>>>>>>>>>>>>>>>>>>>> Sorry for joining the discussion late,
> >>>> >>> thanks for the effort.
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> I think the base idea is great, however I
> >>>> >>> have a couple of
> >>>> >>>>>>>>> comments:
> >>>> >>>>>>>>>>>>>>>>>>>>> - I want to iterate on Timo's comments
> >>>> >>> regarding the confusion
> >>>> >>>>>>>>>>> between
> >>>> >>>>>>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink
> >>>> >>> "Table". Should the
> >>>> >>>>>>>>> refactoring of
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it
> >>>> >>> in this Flip ( as the
> >>>> >>>>>>>>>>>>>>>>>>>>> suggestions
> >>>> >>>>>>>>>>>>>>>>>>>>> in the thread ) and address the holistic
> >>>> >>> changes in a separate
> >>>> >>>>>>>>> Flip
> >>>> >>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>> 2.0?
> >>>> >>>>>>>>>>>>>>>>>>>>> - I feel confused with how it is further
> >>>> >>> with other components,
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>> examples provided feel like a standalone
> >>>> >>> ETL job, could you
> >>>> >>>>>>>>> provide
> >>>> >>>>>>>>>>>>>>>>>>>>> in the
> >>>> >>>>>>>>>>>>>>>>>>>>> FLIP an example where the table is
> >>>> >>> further used in subsequent
> >>>> >>>>>>>>>>> queries
> >>>> >>>>>>>>>>>>>>>>>>>>> (specially in batch mode).
> >>>> >>>>>>>>>>>>>>>>>>>>> - I really like the standard of keeping
> >>>> >>> the unified batch and
> >>>> >>>>>>>>>>>>> streaming
> >>>> >>>>>>>>>>>>>>>>>>>>> approach
> >>>> >>>>>>>>>>>>>>>>>>>>> Best Regards
> >>>> >>>>>>>>>>>>>>>>>>>>> Ahmed Hamdy
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>> On Fri, 22 Mar 2024 at 12:07, Lincoln Lee
> >>>> >>> <
> >>>> >>>>>>>>> lincoln.8...@gmail.com>
> >>>> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Timo,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your thoughtful inputs!
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Yes, expanding the MATERIALIZED
> >>>> >>> VIEW(MV) could achieve the
> >>>> >>>>> same
> >>>> >>>>>>>>>>>>>>>>>>>>> function,
> >>>> >>>>>>>>>>>>>>>>>>>>>>> but our primary concern is that by
> >>>> >>> using a view, we might
> >>>> >>>>> limit
> >>>> >>>>>>>>>>>>> future
> >>>> >>>>>>>>>>>>>>>>>>>>>>> opportunities
> >>>> >>>>>>>>>>>>>>>>>>>>>>> to optimize queries through automatic
> >>>> >>> materialization
> >>>> >>>>> rewriting
> >>>> >>>>>>>>>>> [1],
> >>>> >>>>>>>>>>>>>>>>>>>>>>> leveraging
> >>>> >>>>>>>>>>>>>>>>>>>>>>> the support for MV by physical
> >>>> >>> storage. This is because we
> >>>> >>>>>>>>> would be
> >>>> >>>>>>>>>>>>>>>>>>>>>>> breaking
> >>>> >>>>>>>>>>>>>>>>>>>>>>> the intuitive semantics of a
> >>>> >>> materialized view (a materialized
> >>>> >>>>>>>>> view
> >>>> >>>>>>>>>>>>>>>>>>>>>>> represents
> >>>> >>>>>>>>>>>>>>>>>>>>>>> the result of a query) by allowing
> >>>> >>> data modifications, thus
> >>>> >>>>>>>>> losing
> >>>> >>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>> potential
> >>>> >>>>>>>>>>>>>>>>>>>>>>> for such optimizations.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> With these considerations in mind, we
> >>>> >>> were inspired by Google
> >>>> >>>>>>>>>>>>> Looker's
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Persistent
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Derived Table [2]. PDT is designed for
> >>>> >>> building Looker's
> >>>> >>>>>>>>> automated
> >>>> >>>>>>>>>>>>>>>>>>>>>>> modeling,
> >>>> >>>>>>>>>>>>>>>>>>>>>>> aligning with our purpose for the
> >>>> >>> stream-batch automatic
> >>>> >>>>>>>>> pipeline.
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Therefore,
> >>>> >>>>>>>>>>>>>>>>>>>>>>> we are considering another candidate,
> >>>> >>> Derived Table, the term
> >>>> >>>>>>>>>>>>> 'derive'
> >>>> >>>>>>>>>>>>>>>>>>>>>>> suggests a
> >>>> >>>>>>>>>>>>>>>>>>>>>>> query, and 'table' retains
> >>>> >>> modifiability. This approach would
> >>>> >>>>>>>>> not
> >>>> >>>>>>>>>>>>>>>>>>>>> disrupt
> >>>> >>>>>>>>>>>>>>>>>>>>>>> our current
> >>>> >>>>>>>>>>>>>>>>>>>>>>> concept of a dynamic table, preserving
> >>>> >>> the future utility of
> >>>> >>>>>>>>> MVs.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Conceptually, a Derived Table is a
> >>>> >>> Dynamic Table + Continuous
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Query. By
> >>>> >>>>>>>>>>>>>>>>>>>>>>> introducing
> >>>> >>>>>>>>>>>>>>>>>>>>>>> a new concept Derived Table for this
> >>>> >>> FLIP, this makes all
> >>>> >>>>>>>>>>>>>>>>>>>>>>> concepts to
> >>>> >>>>>>>>>>>>>>>>>>>>> play
> >>>> >>>>>>>>>>>>>>>>>>>>>>> together nicely.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> What do you think about this?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73_NFf4D5$
> >>>> >>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://cloud.google.com/looker/docs/derived-tables*persistent_derived_tables__;Iw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7382-2zI3$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org>
> >>>> >>> 于2024年3月22日周五 17:54写道：
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> thanks for the detailed answer.
> >>>> >>> Sorry, for my late reply, we
> >>>> >>>>>>>>> had a
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> conference that kept me busy.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the current concept[1], it
> >>>> >>> actually includes: Dynamic
> >>>> >>>>>>>>>>> Tables
> >>>> >>>>>>>>>>>>> &
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> & Continuous Query. Dynamic
> >>>> >>> Table is just an abstract
> >>>> >>>>>>>>> logical
> >>>> >>>>>>>>>>>>>>>>>>>>> concept
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> This explanation makes sense to me.
> >>>> >>> But the docs also say "A
> >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> query is evaluated on the dynamic
> >>>> >>> table yielding a new
> >>>> >>>>> dynamic
> >>>> >>>>>>>>>>>>>>>>>>>>> table.".
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> So even our regular CREATE TABLEs
> >>>> >>> are considered dynamic
> >>>> >>>>>>>>> tables.
> >>>> >>>>>>>>>>>>> This
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> can also be seen in the diagram
> >>>> >>> "Dynamic Table -> Continuous
> >>>> >>>>>>>>> Query
> >>>> >>>>>>>>>>>>> ->
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table". Currently, Flink
> >>>> >>> queries can only be executed
> >>>> >>>>>>>>> on
> >>>> >>>>>>>>>>>>>>>>>>>>> Dynamic
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Tables.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In essence, a materialized view
> >>>> >>> represents the result of
> >>>> >>>>> a
> >>>> >>>>>>>>>>>>> query.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Isn't that what your proposal does
> >>>> >>> as well?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the object of the suspend
> >>>> >>> operation is the refresh task
> >>>> >>>>> of
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> dynamic table
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that Snowflake uses
> >>>> >>> the term [1] to merge their
> >>>> >>>>>>>>>>>>> concepts
> >>>> >>>>>>>>>>>>>>>>>>>>> of
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> STREAM, TASK, and TABLE into one
> >>>> >>> piece of concept. But Flink
> >>>> >>>>>>>>> has
> >>>> >>>>>>>>>>> no
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> concept of a "refresh task". Also,
> >>>> >>> they already introduced
> >>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> VIEW. Flink is in the convenient
> >>>> >>> position that the concept of
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> materialized views is not taken
> >>>> >>> (reserved maybe for exactly
> >>>> >>>>>>>>> this
> >>>> >>>>>>>>>>> use
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> case?). And SQL standard concept
> >>>> >>> could be "slightly adapted"
> >>>> >>>>> to
> >>>> >>>>>>>>>>> our
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> needs. Looking at other vendors
> >>>> >>> like Postgres[2], they also
> >>>> >>>>> use
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> `REFRESH` commands so why not
> >>>> >>> adding additional commands such
> >>>> >>>>>>>>> as
> >>>> >>>>>>>>>>>>>>>>>>>>> DELETE
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> or UPDATE. Oracle supports "ON
> >>>> >>> PREBUILT TABLE clause tells
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>> database
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> to use an existing table
> >>>> >>> segment"[3] which comes closer to
> >>>> >>>>>>>>> what we
> >>>> >>>>>>>>>>>>>>>>>>>>> want
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> as well.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> it is not intended to support
> >>>> >>> data modification
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> This is an argument that I
> >>>> >>> understand. But we as Flink could
> >>>> >>>>>>>>> allow
> >>>> >>>>>>>>>>>>>>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> modifications. This way we are only
> >>>> >>> extending the standard
> >>>> >>>>> and
> >>>> >>>>>>>>>>> don't
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce new concepts.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> If we can't agree on using
> >>>> >>> MATERIALIZED VIEW concept. We
> >>>> >>>>> should
> >>>> >>>>>>>>>>> fix
> >>>> >>>>>>>>>>>>>>>>>>>>> our
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> syntax in a Flink 2.0 effort.
> >>>> >>> Making regular tables bounded
> >>>> >>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>> dynamic
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> tables unbounded. We would be
> >>>> >>> closer to the SQL standard with
> >>>> >>>>>>>>> this
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> pave the way for the future. I
> >>>> >>> would actually support this if
> >>>> >>>>>>>>> all
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> concepts play together nicely.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the future, we can consider
> >>>> >>> extending the statement
> >>>> >>>>> set
> >>>> >>>>>>>>>>>>> syntax
> >>>> >>>>>>>>>>>>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> support the creation of multiple
> >>>> >>> dynamic tables.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> It's good that we called the
> >>>> >>> concept STATEMENT SET. This
> >>>> >>>>>>>>> allows us
> >>>> >>>>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> defined CREATE TABLE within. Even
> >>>> >>> if it might look a bit
> >>>> >>>>>>>>>>> confusing.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> Timo
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-creatematerializedview.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zbNhvS7$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://oracle-base.com/articles/misc/materialized-views__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739xS1kvD$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.24 04:14, Feng Jin wrote:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron and Lincoln
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
> >>>> >>> discussion. I believe it will
> >>>> >>>>> greatly
> >>>> >>>>>>>>>>>>>>>>>>>>> improve
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> convenience of managing user
> >>>> >>> real-time pipelines.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have some questions.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding Limitations of
> >>>> >>> Dynamic Table:*
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does not support modifying
> >>>> >>> the select statement after the
> >>>> >>>>>>>>>>> dynamic
> >>>> >>>>>>>>>>>>>>>>>>>>>>> table
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is created.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Although currently we restrict
> >>>> >>> users from modifying the
> >>>> >>>>>>>>> query, I
> >>>> >>>>>>>>>>>>>>>>>>>>> wonder
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> if
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can provide a better way to
> >>>> >>> help users rebuild it without
> >>>> >>>>>>>>>>>>>>>>>>>>> affecting
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> downstream OLAP queries.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the management of
> >>>> >>> background jobs:*
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. From the documentation, the
> >>>> >>> definitions SQL and job
> >>>> >>>>>>>>>>> information
> >>>> >>>>>>>>>>>>>>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this
> >>>> >>> mean that if a system needs
> >>>> >>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>> adapt
> >>>> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to
> >>>> >>> store Flink's job
> >>>> >>>>>>>>> information in
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding system?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, does MySQL's
> >>>> >>> Catalog need to store flink job
> >>>> >>>>>>>>>>>>>>>>>>>>> information
> >>>> >>>>>>>>>>>>>>>>>>>>>>> as
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> well?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Users still need to consider
> >>>> >>> how much memory is being
> >>>> >>>>> used,
> >>>> >>>>>>>>>>> how
> >>>> >>>>>>>>>>>>>>>>>>>>>>> large
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the concurrency is, which type
> >>>> >>> of state backend is being
> >>>> >>>>> used,
> >>>> >>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>> may
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> need
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to set TTL expiration.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the Refresh Part:*
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the refresh mode is
> >>>> >>> continuous and a background job is
> >>>> >>>>>>>>>>> running,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> caution should be taken with the
> >>>> >>> refresh command as it can
> >>>> >>>>>>>>> lead
> >>>> >>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent data.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> When we submit a refresh
> >>>> >>> command, can we help users detect
> >>>> >>>>> if
> >>>> >>>>>>>>>>> there
> >>>> >>>>>>>>>>>>>>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> any
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> running jobs and automatically
> >>>> >>> stop them before executing
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>> refresh
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> command? Then wait for it to
> >>>> >>> complete before restarting the
> >>>> >>>>>>>>>>>>>>>>>>>>> background
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming job?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Feng
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 19, 2024 at 9:40 PM
> >>>> >>> Lincoln Lee <
> >>>> >>>>>>>>>>>>> lincoln.8...@gmail.com
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yun,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your
> >>>> >>> valuable input!
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Incremental mode is indeed an
> >>>> >>> attractive idea, we have also
> >>>> >>>>>>>>>>>>>>>>>>>>> discussed
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, but in the current
> >>>> >>> design,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we first provided two refresh
> >>>> >>> modes: CONTINUOUS and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FULL. Incremental mode can be
> >>>> >>> introduced
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> once the execution layer has
> >>>> >>> the capability.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My answer for the two
> >>>> >>> questions:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, cascading is a good
> >>>> >>> question. Current proposal
> >>>> >>>>>>>>> provides a
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness that defines a
> >>>> >>> dynamic
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table relative to the base
> >>>> >>> table’s lag. If users need to
> >>>> >>>>>>>>>>> consider
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-to-end freshness of
> >>>> >>> multiple
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cascaded dynamic tables, he
> >>>> >>> can manually split them for
> >>>> >>>>> now.
> >>>> >>>>>>>>> Of
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> course, how to let multiple
> >>>> >>> cascaded
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or dependent dynamic tables
> >>>> >>> complete the freshness
> >>>> >>>>>>>>> definition
> >>>> >>>>>>>>>>>>> in
> >>>> >>>>>>>>>>>>>>>>>>>>> a
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simpler way, I think it can be
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extended in the future.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cascading refresh is also a
> >>>> >>> part we focus on discussing. In
> >>>> >>>>>>>>> this
> >>>> >>>>>>>>>>>>>>>>>>>>> flip,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we hope to focus as much as
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible on the core features
> >>>> >>> (as it already involves a lot
> >>>> >>>>>>>>>>>>>>>>>>>>> things),
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so we did not directly
> >>>> >>> introduce related
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax. However, based on the
> >>>> >>> current design, combined
> >>>> >>>>>>>>> with
> >>>> >>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog and lineage,
> >>>> >>> theoretically,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users can also finish the
> >>>> >>> cascading refresh.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang <myas...@live.com>
> >>>> >>> 于2024年3月19日周二 13:45写道：
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
> >>>> >>> discussion, and I am so excited to
> >>>> >>>>>>>>> see
> >>>> >>>>>>>>>>>>>>>>>>>>> this
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> topic
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being discussed in the
> >>>> >>> Flink community!
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  From my point of view,
> >>>> >>> instead of the work of unifying
> >>>> >>>>>>>>>>>>> streaming
> >>>> >>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in DataStream API [1],
> >>>> >>> this FLIP actually could make users
> >>>> >>>>>>>>>>>>> benefit
> >>>> >>>>>>>>>>>>>>>>>>>>>>> from
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine to rule batch &
> >>>> >>> streaming.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we treat this FLIP as
> >>>> >>> an open-source implementation of
> >>>> >>>>>>>>>>>>>>>>>>>>> Snowflake's
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic tables [2], we
> >>>> >>> still lack an incremental refresh
> >>>> >>>>>>>>> mode
> >>>> >>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>> make
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ETL near real-time with a
> >>>> >>> much cheaper computation cost.
> >>>> >>>>>>>>>>> However,
> >>>> >>>>>>>>>>>>>>>>>>>>> I
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> think
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this could be done under
> >>>> >>> the current design by introducing
> >>>> >>>>>>>>>>>>> another
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mode in the future.
> >>>> >>> Although the extra work of incremental
> >>>> >>>>>>>>> view
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> maintenance
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be much larger.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the FLIP itself, I
> >>>> >>> have several questions below:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. It seems this FLIP does
> >>>> >>> not consider the lag of
> >>>> >>>>> refreshes
> >>>> >>>>>>>>>>>>>>>>>>>>> across
> >>>> >>>>>>>>>>>>>>>>>>>>>>> ETL
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layers from ODS ---> DWD
> >>>> >>> ---> APP [3]. We currently only
> >>>> >>>>>>>>>>> consider
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler interval, which
> >>>> >>> means we cannot use lag to
> >>>> >>>>>>>>>>>>> automatically
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> schedule
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the upfront micro-batch
> >>>> >>> jobs to do the work.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. To support the
> >>>> >>> automagical refreshes, we should
> >>>> >>>>> consider
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>> lineage
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the catalog or somewhere
> >>>> >>> else.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-134*3A*Batch*execution*for*the*DataStream*API__;JSsrKysrKw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7352JICzI$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghqpxk$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>> ________________________________
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Lincoln Lee <
> >>>> >>> lincoln.8...@gmail.com>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, March 14,
> >>>> >>> 2024 14:35
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@flink.apache.org <
> >>>> >>> dev@flink.apache.org>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS]
> >>>> >>> FLIP-435: Introduce a New Dynamic
> >>>> >>>>>>>>> Table
> >>>> >>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simplifying Data Pipelines
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jing,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your attention
> >>>> >>> to this flip! I'll try to answer
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> following
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define query
> >>>> >>> of dynamic table?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
> >>>> >>> introducing new syntax?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, how
> >>>> >>> to handle the difference in SQL
> >>>> >>>>>>>>> between
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> streaming
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
> >>>> >>> including window aggregate based on
> >>>> >>>>>>>>>>>>>>>>>>>>> processing
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
> >>>> >>> global order by?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Similar to `CREATE TABLE
> >>>> >>> AS query`, here the `query` also
> >>>> >>>>>>>>> uses
> >>>> >>>>>>>>>>>>>>>>>>>>> Flink
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>> sql
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't introduce a
> >>>> >>> totally new syntax.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We will not change the
> >>>> >>> status respect to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference in
> >>>> >>> functionality of flink sql itself on
> >>>> >>>>>>>>>>> streaming
> >>>> >>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch, for example,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the proctime window agg on
> >>>> >>> streaming and global sort on
> >>>> >>>>>>>>> batch
> >>>> >>>>>>>>>>>>> that
> >>>> >>>>>>>>>>>>>>>>>>>>>>> you
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in fact, do not work
> >>>> >>> properly in the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other mode, so when the
> >>>> >>> user modifies the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh mode of a dynamic
> >>>> >>> table that is not supported, we
> >>>> >>>>>>>>> will
> >>>> >>>>>>>>>>>>>>>>>>>>> throw
> >>>> >>>>>>>>>>>>>>>>>>>>>>> an
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify the
> >>>> >>> query of dynamic table is allowed?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
> >>>> >>> refresh a dynamic table based on the
> >>>> >>>>>>>>> initial
> >>>> >>>>>>>>>>>>>>>>>>>>> query?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, in the current
> >>>> >>> design, the query definition of the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table is not
> >>>> >>> allowed
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be modified, and you
> >>>> >>> can only refresh the data based
> >>>> >>>>>>>>> on
> >>>> >>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial definition.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use dynamic
> >>>> >>> table?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table seems
> >>>> >>> to be similar to the materialized
> >>>> >>>>>>>>>>> view.
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Will
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
> >>>> >>> materialized view rewriting during the
> >>>> >>>>>>>>>>>>>>>>>>>>> optimization?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's true that dynamic
> >>>> >>> table and materialized view
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are similar in some ways,
> >>>> >>> but as Ron
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explains
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are differences. In
> >>>> >>> terms of optimization, automated
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization discovery
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to that supported
> >>>> >>> by calcite is also a potential
> >>>> >>>>>>>>>>>>>>>>>>>>> possibility,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perhaps with the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition of automated
> >>>> >>> rewriting in the future.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron liu <
> >>>> >>> ron9....@gmail.com> 于2024年3月14日周四 14:01写道：
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Timo
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for later
> >>>> >>> response, thanks for your feedback.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding your
> >>>> >>> questions:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has introduced
> >>>> >>> the concept of Dynamic Tables many
> >>>> >>>>>>>>> years
> >>>> >>>>>>>>>>>>>>>>>>>>> ago.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term "Dynamic
> >>>> >>> Table" fit into Flink's regular
> >>>> >>>>>>>>> tables
> >>>> >>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>> also
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it relate to
> >>>> >>> Table API?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that adding
> >>>> >>> the DYNAMIC TABLE keyword could cause
> >>>> >>>>>>>>>>>>>>>>>>>>> confusion
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
> >>>> >>> term for regular CREATE TABLE (that can
> >>>> >>>>>>>>> be
> >>>> >>>>>>>>>>>>>>>>>>>>> "kind
> >>>> >>>>>>>>>>>>>>>>>>>>>>> of
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well and
> >>>> >>> is backed by a changelog) is then
> >>>> >>>>>>>>>>> missing.
> >>>> >>>>>>>>>>>>>>>>>>>>>>> Also
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we call
> >>>> >>> our connectors for those tables,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and DynamicTableSink.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I find
> >>>> >>> it contradicting that a TABLE can be
> >>>> >>>>>>>>>>>>>>>>>>>>> "paused" or
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From an
> >>>> >>> English language perspective, this
> >>>> >>>>> does
> >>>> >>>>>>>>>>>>> sound
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
> >>>> >>> opinion (without much research yet), a
> >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
> >>>> >>> should rather be modelled as a CREATE
> >>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
> >>>> >>> familiar with?) or a new concept such
> >>>> >>>>> as
> >>>> >>>>>>>>> a
> >>>> >>>>>>>>>>>>>>>>>>>>> CREATE
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be paused
> >>>> >>> and resumed?).
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the current
> >>>> >>> concept[1], it actually includes: Dynamic
> >>>> >>>>>>>>>>> Tables
> >>>> >>>>>>>>>>>>> &
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query.
> >>>> >>> Dynamic Table is just an abstract
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logical concept
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , which in its physical
> >>>> >>> form represents either a table
> >>>> >>>>> or a
> >>>> >>>>>>>>>>>>>>>>>>>>>>> changelog
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream. It requires the
> >>>> >>> combination with Continuous Query
> >>>> >>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>> achieve
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic updates of the
> >>>> >>> target table similar to a
> >>>> >>>>> database’s
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Materialized View.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We hope to upgrade the
> >>>> >>> Dynamic Table to a real entity
> >>>> >>>>> that
> >>>> >>>>>>>>>>> users
> >>>> >>>>>>>>>>>>>>>>>>>>> can
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operate, which combines
> >>>> >>> the logical concepts of Dynamic
> >>>> >>>>>>>>>>> Tables +
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. By
> >>>> >>> integrating the definition of tables
> >>>> >>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>> queries,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can achieve
> >>>> >>> functions similar to Materialized Views,
> >>>> >>>>>>>>>>>>>>>>>>>>> simplifying
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users' data processing
> >>>> >>> pipelines.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, the object of the
> >>>> >>> suspend operation is the refresh
> >>>> >>>>>>>>> task of
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table. The
> >>>> >>> command `ALTER DYNAMIC TABLE
> >>>> >>>>> table_name
> >>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
> >>>> >>>>>>>>>>>>>>>>>>>>>>> `
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is actually a shorthand
> >>>> >>> for `ALTER DYNAMIC TABLE
> >>>> >>>>> table_name
> >>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REFRESH` (if written in
> >>>> >>> full for clarity, we can also
> >>>> >>>>>>>>> modify
> >>>> >>>>>>>>>>>>> it).
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Initially, we also
> >>>> >>> considered Materialized Views
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , but ultimately
> >>>> >>> decided against them. Materialized views
> >>>> >>>>>>>>> are
> >>>> >>>>>>>>>>>>>>>>>>>>>>> designed
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to enhance query
> >>>> >>> performance for workloads that consist
> >>>> >>>>> of
> >>>> >>>>>>>>>>>>>>>>>>>>> common,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repetitive query
> >>>> >>> patterns. In essence, a materialized
> >>>> >>>>> view
> >>>> >>>>>>>>>>>>>>>>>>>>>>> represents
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the result of a query.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is not
> >>>> >>> intended to support data modification.
> >>>> >>>>>>>>> For
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lakehouse scenarios,
> >>>> >>> where the ability to delete or
> >>>> >>>>> update
> >>>> >>>>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>> is
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crucial (such as
> >>>> >>> compliance with GDPR, FLIP-2),
> >>>> >>>>>>>>> materialized
> >>>> >>>>>>>>>>>>>>>>>>>>> views
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fall short.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to CREATE
> >>>> >>> (regular) TABLE, CREATE DYNAMIC TABLE
> >>>> >>>>>>>>> not
> >>>> >>>>>>>>>>>>> only
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defines metadata in the
> >>>> >>> catalog but also automatically
> >>>> >>>>>>>>>>> initiates
> >>>> >>>>>>>>>>>>>>>>>>>>> a
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data refresh task based
> >>>> >>> on the query specified during
> >>>> >>>>> table
> >>>> >>>>>>>>>>>>>>>>>>>>>>> creation.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It dynamically executes
> >>>> >>> data updates. Users can focus on
> >>>> >>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies and data
> >>>> >>> generation logic.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The new dynamic table
> >>>> >>> does not conflict with the existing
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource and
> >>>> >>> DynamicTableSink interfaces. For
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>> developer,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all that needs to be
> >>>> >>> implemented is the new
> >>>> >>>>>>>>>>> CatalogDynamicTable,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without changing the
> >>>> >>> implementation of source and sink.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. For now, the FLIP
> >>>> >>> does not consider supporting Table
> >>>> >>>>> API
> >>>> >>>>>>>>>>>>>>>>>>>>>>> operations
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . However, once the SQL
> >>>> >>> syntax is finalized, we can
> >>>> >>>>> discuss
> >>>> >>>>>>>>>>> this
> >>>> >>>>>>>>>>>>>>>>>>>>> in
> >>>> >>>>>>>>>>>>>>>>>>>>>>> a
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate FLIP.
> >>>> >>> Currently, I have a rough idea: the Table
> >>>> >>>>>>>>> API
> >>>> >>>>>>>>>>>>>>>>>>>>> should
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also introduce
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTable operation
> >>>> >>> interfaces
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding to the
> >>>> >>> existing Table interfaces.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The TableEnvironment
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will provide relevant
> >>>> >>> methods to support various
> >>>> >>>>> dynamic
> >>>> >>>>>>>>>>>>> table
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations. The goal
> >>>> >>> for the new Dynamic Table is to
> >>>> >>>>> offer
> >>>> >>>>>>>>>>> users
> >>>> >>>>>>>>>>>>>>>>>>>>> an
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience similar to
> >>>> >>> using a database, which is why we
> >>>> >>>>>>>>>>>>>>>>>>>>> prioritize
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL-based approaches
> >>>> >>> initially.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you envision
> >>>> >>> re-adding the functionality of a
> >>>> >>>>>>>>>>> statement
> >>>> >>>>>>>>>>>>>>>>>>>>> set,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to multiple
> >>>> >>> tables? This is a very important
> >>>> >>>>> use
> >>>> >>>>>>>>>>> case
> >>>> >>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Multi-tables is indeed
> >>>> >>> a very important user scenario. In
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>> future,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we can consider
> >>>> >>> extending the statement set syntax to
> >>>> >>>>>>>>> support
> >>>> >>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of multiple
> >>>> >>> dynamic tables.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
> >>>> >>> days of Flink SQL, we were discussing
> >>>> >>>>>>>>>>> `SELECT
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
> >>>> >>> MINUTES`. Your proposal seems to rephrase
> >>>> >>>>>>>>>>> STREAM
> >>>> >>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other keywords
> >>>> >>> DYNAMIC TABLE and FRESHNESS. But the
> >>>> >>>>>>>>> core
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
> >>>> >>> still there. I'm wondering if we should
> >>>> >>>>>>>>>>> widen
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part of
> >>>> >>> this FLIP but a new FLIP) to follow
> >>>> >>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>> standard
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
> >>>> >>> `SELECT * FROM t` bounded by default and
> >>>> >>>>>>>>> use
> >>>> >>>>>>>>>>>>> new
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
> >>>> >>> behavior. Flink 2.0 would be the perfect
> >>>> >>>>>>>>> time
> >>>> >>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
> >>>> >>> require careful discussions. What do
> >>>> >>>>> you
> >>>> >>>>>>>>>>>>>>>>>>>>> think?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The query part indeed
> >>>> >>> requires a separate FLIP
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for discussion, as it
> >>>> >>> involves changes to the default
> >>>> >>>>>>>>>>> behavior.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>>
> https://urldefense.com/v3/__https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73477_wHn$
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang <
> >>>> >>> beyond1...@gmail.com> 于2024年3月13日周三 15:19写道：
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lincoln & Ron,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
> >>>> >>> proposal.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with the
> >>>> >>> question raised by Timo.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have some
> >>>> >>> other questions.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define
> >>>> >>> query of dynamic table?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
> >>>> >>> introducing new syntax?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql,
> >>>> >>> how to handle the difference in SQL
> >>>> >>>>>>>>> between
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> streaming
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
> >>>> >>> including window aggregate based on
> >>>> >>>>>>>>>>>>>>>>>>>>> processing
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
> >>>> >>> global order by?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify
> >>>> >>> the query of dynamic table is allowed?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
> >>>> >>> refresh a dynamic table based on
> >>>> >>>>> initial
> >>>> >>>>>>>>>>>>> query?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use
> >>>> >>> dynamic table?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table
> >>>> >>> seems to be similar with materialized
> >>>> >>>>>>>>> view.
> >>>> >>>>>>>>>>>>>>>>>>>>> Will
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
> >>>> >>> materialized view rewriting during the
> >>>> >>>>>>>>>>>>>>>>>>>>> optimization?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo Walther <
> >>>> >>> twal...@apache.org> 于2024年3月13日周三 01:24写
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 道：
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln & Ron,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for
> >>>> >>> proposing this FLIP. I think a design
> >>>> >>>>> similar
> >>>> >>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>> what
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose has been
> >>>> >>> in the heads of many people, however,
> >>>> >>>>>>>>> I'm
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this will fit
> >>>> >>> into the bigger picture.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't deeply
> >>>> >>> reviewed the FLIP yet, but would like
> >>>> >>>>> to
> >>>> >>>>>>>>>>> ask
> >>>> >>>>>>>>>>>>>>>>>>>>> some
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial questions:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has
> >>>> >>> introduced the concept of Dynamic Tables many
> >>>> >>>>>>>>>>> years
> >>>> >>>>>>>>>>>>>>>>>>>>> ago.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term
> >>>> >>> "Dynamic Table" fit into Flink's regular
> >>>> >>>>>>>>>>> tables
> >>>> >>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it
> >>>> >>> relate to Table API?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that
> >>>> >>> adding the DYNAMIC TABLE keyword could
> >>>> >>>>> cause
> >>>> >>>>>>>>>>>>>>>>>>>>> confusion
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
> >>>> >>> term for regular CREATE TABLE (that
> >>>> >>>>> can
> >>>> >>>>>>>>> be
> >>>> >>>>>>>>>>>>>>>>>>>>> "kind
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well
> >>>> >>> and is backed by a changelog) is then
> >>>> >>>>>>>>>>>>> missing.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we
> >>>> >>> call our connectors for those tables,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>> >>> DynamicTableSink.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I
> >>>> >>> find it contradicting that a TABLE can be
> >>>> >>>>>>>>>>>>>>>>>>>>> "paused"
> >>>> >>>>>>>>>>>>>>>>>>>>>>> or
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From
> >>>> >>> an English language perspective, this
> >>>> >>>>>>>>> does
> >>>> >>>>>>>>>>>>>>>>>>>>> sound
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
> >>>> >>> opinion (without much research yet), a
> >>>> >>>>>>>>>>>>>>>>>>>>> continuous
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
> >>>> >>> should rather be modelled as a CREATE
> >>>> >>>>>>>>>>>>>>>>>>>>>>> MATERIALIZED
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
> >>>> >>> familiar with?) or a new concept such
> >>>> >>>>>>>>> as a
> >>>> >>>>>>>>>>>>>>>>>>>>> CREATE
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be
> >>>> >>> paused and resumed?).
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you
> >>>> >>> envision re-adding the functionality of a
> >>>> >>>>>>>>>>> statement
> >>>> >>>>>>>>>>>>>>>>>>>>>>> set,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to
> >>>> >>> multiple tables? This is a very important
> >>>> >>>>> use
> >>>> >>>>>>>>>>> case
> >>>> >>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
> >>>> >>> days of Flink SQL, we were discussing
> >>>> >>>>>>>>>>> `SELECT
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
> >>>> >>> MINUTES`. Your proposal seems to rephrase
> >>>> >>>>>>>>>>> STREAM
> >>>> >>>>>>>>>>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other
> >>>> >>> keywords DYNAMIC TABLE and FRESHNESS. But
> >>>> >>>>> the
> >>>> >>>>>>>>>>> core
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
> >>>> >>> still there. I'm wondering if we
> >>>> >>>>> should
> >>>> >>>>>>>>>>> widen
> >>>> >>>>>>>>>>>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part
> >>>> >>> of this FLIP but a new FLIP) to follow
> >>>> >>>>>>>>> the
> >>>> >>>>>>>>>>>>>>>>>>>>>>> standard
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
> >>>> >>> `SELECT * FROM t` bounded by default
> >>>> >>>>> and
> >>>> >>>>>>>>> use
> >>>> >>>>>>>>>>>>>>>>>>>>> new
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
> >>>> >>> behavior. Flink 2.0 would be the
> >>>> >>>>> perfect
> >>>> >>>>>>>>>>> time
> >>>> >>>>>>>>>>>>>>>>>>>>> for
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
> >>>> >>> require careful discussions. What do
> >>>> >>>>>>>>> you
> >>>> >>>>>>>>>>>>>>>>>>>>> think?
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 11.03.24
> >>>> >>> 08:23, Ron liu wrote:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Dev
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
> >>>> >>> and I would like to start a discussion
> >>>> >>>>> about
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-435:
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Introduce a
> >>>> >>> New Dynamic Table for Simplifying Data
> >>>> >>>>>>>>>>>>> Pipelines.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This FLIP is
> >>>> >>> designed to simplify the development of
> >>>> >>>>>>>>> data
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processing
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
> >>>> >>> With Dynamic Tables with uniform SQL
> >>>> >>>>>>>>> statements
> >>>> >>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness,
> >>>> >>> users can define batch and streaming
> >>>> >>>>>>>>>>>>>>>>>>>>> transformations
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in the
> >>>> >>> same way, accelerate ETL pipeline
> >>>> >>>>>>>>> development,
> >>>> >>>>>>>>>>>>> and
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> task
> >>>> >>> scheduling automatically.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For more
> >>>> >>> details, see FLIP-435 [1]. Looking forward to
> >>>> >>>>>>>>> your
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback.
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln & Ron
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>
> >>>> >>>
> >>>> >>
> >>>> >
> >>>>
> >>>>
>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Reply via email to