Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

yuxia Fri, 14 Apr 2023 04:54:51 -0700

Hi, Mang.
+1 for completing the support for atomicity of CTAS, this is very useful in 
batch scenarios and integrate with the data lake which support transcation.


I just have one question, IIUC, the DynamiacTableSink will need to know it's 
for normal case or the atomicity with CTAS as well as neccessary context.
Take jdbc catalog as an example, if it's CTAS with atomicity supports, the jdbc 
DynamiacTableSink will write the temp table defined in the TwoPhaseCatalogTable 
which is different from normal case.

How can the DynamiacTableSink can get it? Could you give some explanation or 
example in this FLIP?


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "zhangmang1" <zhangma...@163.com>
收件人: "dev" <dev@flink.apache.org>, "ron9 liu" <ron9....@gmail.com>, "lincoln 
86xy" <lincoln.8...@gmail.com>
发送时间: 星期五, 2023年 4 月 14日 下午 2:50:40
主题: Re:Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) 
statement

Hi, Lincoln and Ron


Thank you for your reply.
On the naming wise I think OK, the future expansion of new features more 
uniform. I have updated the FLIP.


About Hive support atomicity CTAS, Hive is rich in usage scenarios and can be 
divided into three scenarios: 1. writing Hive tables 2. writing Hive tables 
with speculative execution 3. writing Hive table with small file merge


The main purpose of FLIP-305 is to implement support for CTAS atomicity in the 
Flink framework,
so I only poc to verify the first scenario of writing to the Hive table, and we 
can subsequently split the sub-task to support the other two scenarios.














--

Best regards,
Mang Zhang





At 2023-04-13 12:27:24, "Lincoln Lee" <lincoln.8...@gmail.com> wrote:
>Hi, Mang
>
>+1 for completing the support for atomicity of CTAS, this is very useful in
>batch scenarios.
>
>I have two questions:
>1. naming wise:
>  a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to
>`Catalog#twoPhaseCreateTable` (and we may add
>twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later)
>  b) for the `TwoPhaseCommitCatalogTable`, may it be better using
>`TwoPhaseCatalogTable`?
>  c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word 'transaction'
>in the method name, which may remind users of the relevance of transaction
>support (however, it is not strictly so), so I suggest changing it to
>`begin`
>2. Has this design been validated by any relevant Poc on hive or other
>catalogs?
>
>Best,
>Lincoln Lee
>
>
>liu ron <ron9....@gmail.com> 于2023年4月13日周四 10:17写道：
>
>> Hi, Mang
>> Atomicity is very important for CTAS, especially for batch jobs. This FLIP
>> is a continuation of FLIP-218, which is valuable for CTAS.
>> I just have one question, in the Motivation part of FLIP-218, we mentioned
>> three levels of atomicity semantics, can this current design do the same as
>> Spark's DataSource V2, which can guarantee both atomicity and isolation,
>> for example, can it be done by writing to Hive tables using CTAS?
>>
>> Best,
>> Ron
>>
>> Mang Zhang <zhangma...@163.com> 于2023年4月10日周一 11:03写道：
>>
>> > Hi, everyone
>> >
>> >
>> >
>> >
>> > I'd like to start a discussion about FLIP-305: Support atomic for CREATE
>> > TABLE AS SELECT(CTAS) statement [1].
>> >
>> >
>> >
>> >
>> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not
>> > atomic. It will create the table first before job running. If the job
>> > execution fails, or is cancelled, the table will not be dropped.
>> >
>> >
>> >
>> >
>> > So I want Flink to support atomic CTAS, where only the table is created
>> > when the Job succeeds. Improve user experience.
>> >
>> >
>> >
>> >
>> > Looking forward to your feedback.
>> >
>> >
>> >
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Mang Zhang
>>

Re: [DISCUSS] FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement

Reply via email to