lsyldliu commented on code in PR #26064: URL: https://github.com/apache/flink/pull/26064#discussion_r1929664176
########## docs/content/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">Note</span> - The REFRESH operation will start a Flink batch job to refresh the materialized table data. +## AS <select_statement> + +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +The `AS <select_statement>` clause allows you to modify the query definition of a materialized table. It updates the query used by the materialized table refresh job and infers a new schema based on the updated query to adjust the table’s schema. However, this operation does not directly affect existing data. + +The modification process depends on the refresh mode of the materialized table: + +**Full mode:** Review Comment: ``` **Full Mode:** 1. Update the `schema` and `query definition` of the materialized table. 2. The table is refreshed using the new query definition when the next refresh job is triggered: - If it is a partitioned table and [partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) is correctly set, only the latest partition will be refreshed. - Otherwise, the table will be overwritten entirely. ``` ########## docs/content.zh/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">注意</span> - REFRESH 操作会启动批作业来刷新表的数据。 +## AS <select_statement> +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +`AS <select_statement>` 用于修改物化表的查询定义。它会更新物化表刷新任务中使用的查询,并基于更新后的查询推导出新的 `schema` ,从而调整表的 `schema` 。但该操作不会直接影响现有数据。 + +具体修改流程取决于物化表的刷新模式: + +**全量模式:** + +1. 更新物化表的 `schema` 和查询定义。 +2. 下次刷新任务启动时,将使用新的查询刷新: +- 如果修改的物化表是分区表,且[partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) 配置正确,则仅刷新最新分区。 +- 否则,将刷新整个表的数据。 + +**持续模式:** + +1. 暂停当前的实时刷新任务。 +2. 更新物化表的 `schema` 和查询定义。 +3. 启动新的实时任务以刷新物化表: +- 新的刷新任务会从头开始,而不是从之前的状态继续。 +- 数据源的起始消费位置会由到连接器的默认实现或查询中设置的 `option hint` 决定) Review Comment: `起始消费位置` -> `起始位点`。It's a commonly used term. ########## docs/content.zh/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">注意</span> - REFRESH 操作会启动批作业来刷新表的数据。 +## AS <select_statement> +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +`AS <select_statement>` 用于修改物化表的查询定义。它会更新物化表刷新任务中使用的查询,并基于更新后的查询推导出新的 `schema` ,从而调整表的 `schema` 。但该操作不会直接影响现有数据。 + +具体修改流程取决于物化表的刷新模式: + +**全量模式:** + +1. 更新物化表的 `schema` 和查询定义。 +2. 下次刷新任务启动时,将使用新的查询刷新: +- 如果修改的物化表是分区表,且[partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) 配置正确,则仅刷新最新分区。 +- 否则,将刷新整个表的数据。 + +**持续模式:** + +1. 暂停当前的实时刷新任务。 +2. 更新物化表的 `schema` 和查询定义。 +3. 启动新的实时任务以刷新物化表: Review Comment: `实时任务` -> `流式任务` ########## docs/content/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">Note</span> - The REFRESH operation will start a Flink batch job to refresh the materialized table data. +## AS <select_statement> + +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +The `AS <select_statement>` clause allows you to modify the query definition of a materialized table. It updates the query used by the materialized table refresh job and infers a new schema based on the updated query to adjust the table’s schema. However, this operation does not directly affect existing data. Review Comment: ```suggestion The `AS <select_statement>` clause allows you to modify the query definition for refreshing materialized table. It will first evolve the table's schema using the schema derived from the new query and then use the new query to refresh the table data. It is important to emphasize that, by default, this does not impact historical data. ``` ########## docs/content/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">Note</span> - The REFRESH operation will start a Flink batch job to refresh the materialized table data. +## AS <select_statement> + +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +The `AS <select_statement>` clause allows you to modify the query definition of a materialized table. It updates the query used by the materialized table refresh job and infers a new schema based on the updated query to adjust the table’s schema. However, this operation does not directly affect existing data. + +The modification process depends on the refresh mode of the materialized table: + +**Full mode:** + +1. Update the schema and query definition of the materialized table. +2. During the next refresh job, the table is refreshed using the new query definition: +- If the table is a partitioned table and [partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) is correctly set, only the latest partition will be refreshed. +- Otherwise, the entire table will be refreshed. + +**Continuous mode:** Review Comment: ``` **Continuous Mode:** 1. Pause the current running refresh job. 2. Update the `schema` and `query definition` of the materialized table. 3. Start a new refresh job to refresh the materialized table: - The new refresh job starts from the beginning and does not restore from the previous state. - The starting offset of the data source is determined by the connector’s default implementation or the option hint specified in the query. ``` ########## docs/content.zh/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">注意</span> - REFRESH 操作会启动批作业来刷新表的数据。 +## AS <select_statement> +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +`AS <select_statement>` 用于修改物化表的查询定义。它会更新物化表刷新任务中使用的查询,并基于更新后的查询推导出新的 `schema` ,从而调整表的 `schema` 。但该操作不会直接影响现有数据。 Review Comment: Please update the Chinese translation according to the latest English comments ########## docs/content/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">Note</span> - The REFRESH operation will start a Flink batch job to refresh the materialized table data. +## AS <select_statement> + +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +The `AS <select_statement>` clause allows you to modify the query definition of a materialized table. It updates the query used by the materialized table refresh job and infers a new schema based on the updated query to adjust the table’s schema. However, this operation does not directly affect existing data. + +The modification process depends on the refresh mode of the materialized table: + +**Full mode:** + +1. Update the schema and query definition of the materialized table. +2. During the next refresh job, the table is refreshed using the new query definition: +- If the table is a partitioned table and [partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) is correctly set, only the latest partition will be refreshed. +- Otherwise, the entire table will be refreshed. + +**Continuous mode:** + +1. Pause the current continuous refresh jobs. +2. Update the `schema` and `query definition` of the materialized table. +3. Start a new continuous refresh job to refresh the materialized table: +- The new refresh job starts from the beginning and does not restore from the previous state. +- The starting consumption position of the data source is determined by the connector’s default implementation or the option hint specified in the query. + +**Example:** + +```sql +-- Definition of origin materialized table +CREATE MATERIALIZED TABLE my_materialized_table + FRESHNESS = INTERVAL '10' SECOND + AS + SELECT + user_id, + COUNT(*) AS event_count, + SUM(amount) AS total_amount + FROM + kafka_catalog.db1.events + WHERE + event_type = 'purchase' + GROUP BY + user_id; + +-- Modify the query definition of materialized table +ALTER MATERIALIZED TABLE my_materialized_table + AS + SELECT + user_id, + COUNT(*) AS event_count, + SUM(amount) AS total_amount, + AVG(amount) AS avg_amount -- Add a new nullable column at the end + FROM + kafka_catalog.db1.events + WHERE + event_type = 'purchase' + GROUP BY + user_id; +``` + +<span class="label label-danger">Note</span> +- Schema modification only supports adding `nullable` columns at the end of the original table's schema. Review Comment: ```suggestion - Schema evolution currently only supports adding `nullable` columns to the end of the original table's schema. ``` ########## docs/content.zh/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">注意</span> - REFRESH 操作会启动批作业来刷新表的数据。 +## AS <select_statement> +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +`AS <select_statement>` 用于修改物化表的查询定义。它会更新物化表刷新任务中使用的查询,并基于更新后的查询推导出新的 `schema` ,从而调整表的 `schema` 。但该操作不会直接影响现有数据。 + +具体修改流程取决于物化表的刷新模式: + +**全量模式:** + +1. 更新物化表的 `schema` 和查询定义。 +2. 下次刷新任务启动时,将使用新的查询刷新: Review Comment: ```suggestion 2. 在刷新任务下次触发执行时,将使用新的查询定义刷新数据: ``` ########## docs/content.zh/docs/dev/table/materialized-table/statements.md: ########## @@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH PARTITION (ds='2024-06-28 <span class="label label-danger">注意</span> - REFRESH 操作会启动批作业来刷新表的数据。 +## AS <select_statement> +```sql +ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS <select_statement> +``` + +`AS <select_statement>` 用于修改物化表的查询定义。它会更新物化表刷新任务中使用的查询,并基于更新后的查询推导出新的 `schema` ,从而调整表的 `schema` 。但该操作不会直接影响现有数据。 + +具体修改流程取决于物化表的刷新模式: + +**全量模式:** + +1. 更新物化表的 `schema` 和查询定义。 +2. 下次刷新任务启动时,将使用新的查询刷新: +- 如果修改的物化表是分区表,且[partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) 配置正确,则仅刷新最新分区。 +- 否则,将刷新整个表的数据。 + +**持续模式:** + +1. 暂停当前的实时刷新任务。 +2. 更新物化表的 `schema` 和查询定义。 +3. 启动新的实时任务以刷新物化表: +- 新的刷新任务会从头开始,而不是从之前的状态继续。 +- 数据源的起始消费位置会由到连接器的默认实现或查询中设置的 `option hint` 决定) + +**示例:** + +```sql +-- 原始物化表定义 +CREATE MATERIALIZED TABLE my_materialized_table + FRESHNESS = INTERVAL '10' SECOND + AS + SELECT + user_id, + COUNT(*) AS event_count, + SUM(amount) AS total_amount + FROM + kafka_catalog.db1.events + WHERE + event_type = 'purchase' + GROUP BY + user_id; + +-- 修改现有物化表的查询 +ALTER MATERIALIZED TABLE my_materialized_table +AS SELECT + user_id, + COUNT(*) AS event_count, + SUM(amount) AS total_amount, + AVG(amount) AS avg_amount -- 在末尾添加新的可为空列 +FROM + kafka_catalog.db1.events +WHERE + event_type = 'purchase' +GROUP BY + user_id; +``` + +<span class="label label-danger">注意</span> +- Schema 修改仅支持在原表 schema 末尾添加 `nullable` 列。 Review Comment: ```suggestion - Schema 演进当前仅支持在原表 schema 尾部添加`可空列`。 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org