Hi Ron

Thanks for your reply.

>  Is it only possible to add columns at the end and not anywhere in table
schema, some databases have this limitation, does lake storage such as
Iceberg/Paimon have this limitation?


 Currently, we can restrict adding columns only to the end of the schema.
Although both Paimon and Iceberg already support adding columns anywhere,
there are still some systems that do not. I will include this in the FLIP.


> In the Refresh Task Behavior section you mention partition hints, is it
possible to clarify what it is in the FLIP?


I have added the relevant details.


>  Are you able to articulate the default behavior?


The detailed explanation for this part has been updated.


>  How users can determine if states are compatible?


Users can only rely on their experience to make modifications. Currently,
the Flink framework does not guarantee that changes to SQL logic will
maintain state compatibility.

I think we can add some suggestions in the user documentation in the
future. While the framework itself cannot ensure state compatibility, some
simple modification scenarios can indeed be compatible.

For now, the responsibility is left to the users.


Even if recovery ultimately fails, users still have the option to roll back
to the original query or start consuming from a new offset by disabling
recovery parameters.




Best,
Feng


On Tue, Dec 17, 2024 at 10:37 AM Ron Liu <ron9....@gmail.com> wrote:

> Hi Feng
>
> Thanks for initiating this FLIP, in lakehouse, Schema Evolution of tables
> due to modification of business logic is a very common scenario, so
> Materialized Table's support for modification of Query can greatly improve
> flexibility and usability, and we've seen that other similar products in
> the industry also support this capability.
>
> I read the content of this FLIP and the overall design looks good, +1.
> However, I have some questions as follows:
>
> 1. By `ALTER MATERIALIZED TABLE ... AS select` statement to realize the
> add column logic, is it only possible to add columns at the end and not
> anywhere in table schema, some databases have this limitation, does lake
> storage such as Iceberg/Paimon have this limitation?
> 2. In the Refresh Task Behavior section you mention partition hints, is it
> possible to clarify what it is in the FLIP?
>
> >>> *CONTINUOUS Mode: *Stops the old job and starts a new one with the
> updated query.
>
>    - The initial position of the new job is controlled by the source
>    parameters.
>    - For compatible logic changes, recovery parameters
>    (execution.state-recovery.path)  can be manually set if state compatibility
>    is confirmed.
>
>
> 4. Are you able to articulate the default behavior?
> 5. How users can determine if states are compatible?
>
> Best,
> Ron
>
> Feng Jin <jinfeng1...@gmail.com> 于2024年12月16日周一 10:49写道:
>
>> Hi, everyone,
>>
>> I’d like to initiate a discussion on FLIP-492: Support Query
>> Modifications for Materialized Tables[1].
>>
>> In FLIP-435[2], we introduced *MATERIALIZED TABLES*. By defining query
>> logic and specifying data freshness requirements, users can efficiently
>> build data pipelines, greatly improving development productivity.
>> FLIP-492 builds on this by addressing a common need: the ability to
>> modify the query logic of an existing MATERIALIZED TABLE. Two approaches
>> are proposed:
>>
>>
>> *1. Modifying the Query Logic: ALTER MATERIALIZED TABLE AS <query>*
>> Retain historical data while modifying the query logic:
>>
>> ```
>> ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <query>
>> ```
>>
>>
>> *2. Replacing the Table: CREATE OR REPLACE MATERIALIZED TABLE*
>> Reconstruct the table with a new query, discarding all historical data:
>>
>> ```
>> CREATE [OR REPLACE] MATERIALIZED TABLE
>> [catalog_name.][db_name.]table_name
>> [ ([<table_constraint>]) ]
>> [COMMENT table_comment]
>> [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
>> [WITH (key1=val1, key2=val2, ...)]
>> FRESHNESS = INTERVAL '<num>' { SECOND | MINUTE | HOUR | DAY }
>> [REFRESH_MODE = { CONTINUOUS | FULL }]
>> AS <select_statement>
>> ```
>>
>> For a more detailed explanation of this proposal, please refer to the
>> FLIP-492[1] documentation.
>> Your feedback and suggestions are highly appreciated to help refine this
>> proposal further.
>>
>> Lastly, I’d like to thank Ron and Lincoln (cc’d) for their valuable input
>> and suggestions during the drafting process.
>>
>> Looking forward to hearing your thoughts!
>>
>>
>> Best,
>> Feng Jin
>>
>>
>> [1].
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables
>> [2].
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines
>>
>

Reply via email to