Hi, Lincoln and Ron
Thank you for your reply. On the naming wise I think OK, the future expansion of new features more uniform. I have updated the FLIP. About Hive support atomicity CTAS, Hive is rich in usage scenarios and can be divided into three scenarios: 1. writing Hive tables 2. writing Hive tables with speculative execution 3. writing Hive table with small file merge The main purpose of FLIP-305 is to implement support for CTAS atomicity in the Flink framework, so I only poc to verify the first scenario of writing to the Hive table, and we can subsequently split the sub-task to support the other two scenarios. -- Best regards, Mang Zhang At 2023-04-13 12:27:24, "Lincoln Lee" <lincoln.8...@gmail.com> wrote: >Hi, Mang > >+1 for completing the support for atomicity of CTAS, this is very useful in >batch scenarios. > >I have two questions: >1. naming wise: > a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to >`Catalog#twoPhaseCreateTable` (and we may add >twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later) > b) for the `TwoPhaseCommitCatalogTable`, may it be better using >`TwoPhaseCatalogTable`? > c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word 'transaction' >in the method name, which may remind users of the relevance of transaction >support (however, it is not strictly so), so I suggest changing it to >`begin` >2. Has this design been validated by any relevant Poc on hive or other >catalogs? > >Best, >Lincoln Lee > > >liu ron <ron9....@gmail.com> 于2023年4月13日周四 10:17写道: > >> Hi, Mang >> Atomicity is very important for CTAS, especially for batch jobs. This FLIP >> is a continuation of FLIP-218, which is valuable for CTAS. >> I just have one question, in the Motivation part of FLIP-218, we mentioned >> three levels of atomicity semantics, can this current design do the same as >> Spark's DataSource V2, which can guarantee both atomicity and isolation, >> for example, can it be done by writing to Hive tables using CTAS? >> >> Best, >> Ron >> >> Mang Zhang <zhangma...@163.com> 于2023年4月10日周一 11:03写道: >> >> > Hi, everyone >> > >> > >> > >> > >> > I'd like to start a discussion about FLIP-305: Support atomic for CREATE >> > TABLE AS SELECT(CTAS) statement [1]. >> > >> > >> > >> > >> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not >> > atomic. It will create the table first before job running. If the job >> > execution fails, or is cancelled, the table will not be dropped. >> > >> > >> > >> > >> > So I want Flink to support atomic CTAS, where only the table is created >> > when the Job succeeds. Improve user experience. >> > >> > >> > >> > >> > Looking forward to your feedback. >> > >> > >> > >> > >> > [1] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Best regards, >> > Mang Zhang >>