Hi, Lincoln and Ron

Thank you for your reply.
On the naming wise I think OK, the future expansion of new features more 
uniform. I have updated the FLIP.


About Hive support atomicity CTAS, Hive is rich in usage scenarios and can be 
divided into three scenarios: 1. writing Hive tables 2. writing Hive tables 
with speculative execution 3. writing Hive table with small file merge


The main purpose of FLIP-305 is to implement support for CTAS atomicity in the 
Flink framework,
so I only poc to verify the first scenario of writing to the Hive table, and we 
can subsequently split the sub-task to support the other two scenarios.














--

Best regards,
Mang Zhang





At 2023-04-13 12:27:24, "Lincoln Lee" <lincoln.8...@gmail.com> wrote:
>Hi, Mang
>
>+1 for completing the support for atomicity of CTAS, this is very useful in
>batch scenarios.
>
>I have two questions:
>1. naming wise:
>  a) can we rename the `Catalog#getTwoPhaseCommitCreateTable` to
>`Catalog#twoPhaseCreateTable` (and we may add
>twoPhaseReplaceTable/twoPhaseCreateOrReplaceTable later)
>  b) for the `TwoPhaseCommitCatalogTable`, may it be better using
>`TwoPhaseCatalogTable`?
>  c) `TwoPhaseCommitCatalogTable#beginTransaction`, the word 'transaction'
>in the method name, which may remind users of the relevance of transaction
>support (however, it is not strictly so), so I suggest changing it to
>`begin`
>2. Has this design been validated by any relevant Poc on hive or other
>catalogs?
>
>Best,
>Lincoln Lee
>
>
>liu ron <ron9....@gmail.com> 于2023年4月13日周四 10:17写道:
>
>> Hi, Mang
>> Atomicity is very important for CTAS, especially for batch jobs. This FLIP
>> is a continuation of FLIP-218, which is valuable for CTAS.
>> I just have one question, in the Motivation part of FLIP-218, we mentioned
>> three levels of atomicity semantics, can this current design do the same as
>> Spark's DataSource V2, which can guarantee both atomicity and isolation,
>> for example, can it be done by writing to Hive tables using CTAS?
>>
>> Best,
>> Ron
>>
>> Mang Zhang <zhangma...@163.com> 于2023年4月10日周一 11:03写道:
>>
>> > Hi, everyone
>> >
>> >
>> >
>> >
>> > I'd like to start a discussion about FLIP-305: Support atomic for CREATE
>> > TABLE AS SELECT(CTAS) statement [1].
>> >
>> >
>> >
>> >
>> > CREATE TABLE AS SELECT(CTAS) statement has been support, but it's not
>> > atomic. It will create the table first before job running. If the job
>> > execution fails, or is cancelled, the table will not be dropped.
>> >
>> >
>> >
>> >
>> > So I want Flink to support atomic CTAS, where only the table is created
>> > when the Job succeeds. Improve user experience.
>> >
>> >
>> >
>> >
>> > Looking forward to your feedback.
>> >
>> >
>> >
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-305%3A+Support+atomic+for+CREATE+TABLE+AS+SELECT%28CTAS%29+statement
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Mang Zhang
>>

Reply via email to