Hi, Yuxia Thank you for your reply. We can identify whether a CatalogTable supports atomic Ctas by determining its type in DynamicTableFactory/DynamicTableSink, like the following: boolean isAtomicCtas = context.getCatalogTable().getOrigin() instanceof TwoPhaseCatalogTable; And I've updated the flip. this is my poc commit : https://github.com/Tartarus0zm/flink/commit/ca82b6a816491df5a251b410f4c614436402d2dc Looking forward to more feedback
-- Best regards, Mang Zhang At 2023-04-14 19:46:08, "yuxia" <luoyu...@alumni.sjtu.edu.cn> wrote: >Hi, Mang. >+1 for completing the support for atomicity of CTAS, this is very useful in >batch scenarios and integrate with the data lake which support transcation. > >I just have one question, IIUC, the DynamiacTableSink will need to know it's >for normal case or the atomicity with CTAS as well as neccessary context. >Take jdbc catalog as an example, if it's CTAS with atomicity supports, the >jdbc DynamiacTableSink will write the temp table defined in the >TwoPhaseCatalogTable which is different from normal case. > >How can the DynamiacTableSink can get it? Could you give some explanation or >example in this FLIP? > > >Best regards, >Yuxia > >----- 原始邮件 ----- >发件人: "zhangmang1" <zhangma...@163.com> >收件人: "dev" <dev@flink.apache.org>, "ron9 liu" <ron9....@gmail.com>, "lincoln >86xy" <lincoln.8...@gmail.com> >发送时间: 星期五, 2023年 4 月 14日 下午 2:50:40 >主题: Re:Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) >statement > >Hi, Lincoln and Ron > > >Thank you for your reply. >On the naming wise I think OK, the future expansion of new features more >uniform. I have updated the FLIP. > > >About Hive support atomicity CTAS, Hive is rich in usage scenarios and can be >divided into three scenarios: 1. writing Hive tables 2. writing Hive tables >with speculative execution 3. writing Hive table with small file merge > > >The main purpose of FLIP-305 is to implement support for CTAS atomicity in the >Flink framework, >so I only poc to verify the first scenario of writing to the Hive table, and >we can subsequently split the sub-task to support the other two scenarios. > > > > > > > > > > > > > > >-- > >Best regards, >Mang Zhang > > > > > >At 2023-04-13 12:27:24, "Lincoln Lee" <lincoln.8...@gmail.com> wrote: >>Hi, Mang >> >>+1 for completing the support for atomicity of CTAS, this is very useful in >>batch scenarios. >> >>I have two questions: >>1. naming wise: >> a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to >>`Catalog#twoPhaseCreateTable` (and we may add >>twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later) >> b) for the `TwoPhaseCommitCatalogTable`, may it be better using >>`TwoPhaseCatalogTable`? >> c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word 'transaction' >>in the method name, which may remind users of the relevance of transaction >>support (however, it is not strictly so), so I suggest changing it to >>`begin` >>2. Has this design been validated by any relevant Poc on hive or other >>catalogs? >> >>Best, >>Lincoln Lee >> >> >>liu ron <ron9....@gmail.com> 于2023年4月13日周四 10:17写道: >> >>> Hi, Mang >>> Atomicity is very important for CTAS, especially for batch jobs. This FLIP >>> is a continuation of FLIP-218, which is valuable for CTAS. >>> I just have one question, in the Motivation part of FLIP-218, we mentioned >>> three levels of atomicity semantics, can this current design do the same as >>> Spark's DataSource V2, which can guarantee both atomicity and isolation, >>> for example, can it be done by writing to Hive tables using CTAS? >>> >>> Best, >>> Ron >>> >>> Mang Zhang <zhangma...@163.com> 于2023年4月10日周一 11:03写道: >>> >>> > Hi, everyone >>> > >>> > >>> > >>> > >>> > I'd like to start a discussion about FLIP-305: Support atomic for CREATE >>> > TABLE AS SELECT(CTAS) statement [1]. >>> > >>> > >>> > >>> > >>> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not >>> > atomic. It will create the table first before job running. If the job >>> > execution fails, or is cancelled, the table will not be dropped. >>> > >>> > >>> > >>> > >>> > So I want Flink to support atomic CTAS, where only the table is created >>> > when the Job succeeds. Improve user experience. >>> > >>> > >>> > >>> > >>> > Looking forward to your feedback. >>> > >>> > >>> > >>> > >>> > [1] >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > >>> > Best regards, >>> > Mang Zhang >>>