Hi, Mang. +1 for completing the support for atomicity of CTAS, this is very useful in batch scenarios and integrate with the data lake which support transcation.
I just have one question, IIUC, the DynamiacTableSink will need to know it's for normal case or the atomicity with CTAS as well as neccessary context. Take jdbc catalog as an example, if it's CTAS with atomicity supports, the jdbc DynamiacTableSink will write the temp table defined in the TwoPhaseCatalogTable which is different from normal case. How can the DynamiacTableSink can get it? Could you give some explanation or example in this FLIP? Best regards, Yuxia ----- 原始邮件 ----- 发件人: "zhangmang1" <zhangma...@163.com> 收件人: "dev" <dev@flink.apache.org>, "ron9 liu" <ron9....@gmail.com>, "lincoln 86xy" <lincoln.8...@gmail.com> 发送时间: 星期五, 2023年 4 月 14日 下午 2:50:40 主题: Re:Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement Hi, Lincoln and Ron Thank you for your reply. On the naming wise I think OK, the future expansion of new features more uniform. I have updated the FLIP. About Hive support atomicity CTAS, Hive is rich in usage scenarios and can be divided into three scenarios: 1. writing Hive tables 2. writing Hive tables with speculative execution 3. writing Hive table with small file merge The main purpose of FLIP-305 is to implement support for CTAS atomicity in the Flink framework, so I only poc to verify the first scenario of writing to the Hive table, and we can subsequently split the sub-task to support the other two scenarios. -- Best regards, Mang Zhang At 2023-04-13 12:27:24, "Lincoln Lee" <lincoln.8...@gmail.com> wrote: >Hi, Mang > >+1 for completing the support for atomicity of CTAS, this is very useful in >batch scenarios. > >I have two questions: >1. naming wise: > a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to >`Catalog#twoPhaseCreateTable` (and we may add >twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later) > b) for the `TwoPhaseCommitCatalogTable`, may it be better using >`TwoPhaseCatalogTable`? > c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word 'transaction' >in the method name, which may remind users of the relevance of transaction >support (however, it is not strictly so), so I suggest changing it to >`begin` >2. Has this design been validated by any relevant Poc on hive or other >catalogs? > >Best, >Lincoln Lee > > >liu ron <ron9....@gmail.com> 于2023年4月13日周四 10:17写道: > >> Hi, Mang >> Atomicity is very important for CTAS, especially for batch jobs. This FLIP >> is a continuation of FLIP-218, which is valuable for CTAS. >> I just have one question, in the Motivation part of FLIP-218, we mentioned >> three levels of atomicity semantics, can this current design do the same as >> Spark's DataSource V2, which can guarantee both atomicity and isolation, >> for example, can it be done by writing to Hive tables using CTAS? >> >> Best, >> Ron >> >> Mang Zhang <zhangma...@163.com> 于2023年4月10日周一 11:03写道: >> >> > Hi, everyone >> > >> > >> > >> > >> > I'd like to start a discussion about FLIP-305: Support atomic for CREATE >> > TABLE AS SELECT(CTAS) statement [1]. >> > >> > >> > >> > >> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not >> > atomic. It will create the table first before job running. If the job >> > execution fails, or is cancelled, the table will not be dropped. >> > >> > >> > >> > >> > So I want Flink to support atomic CTAS, where only the table is created >> > when the Job succeeds. Improve user experience. >> > >> > >> > >> > >> > Looking forward to your feedback. >> > >> > >> > >> > >> > [1] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Best regards, >> > Mang Zhang >>