The PR-2303 defines how the batch job does the compaction work,  the
PR-2308  decides what's the behavior that compaction txn and  row-delta txn
commit at the same time.   They should n't block each other,  but we will
need to resolve both of them.

On Tue, May 18, 2021 at 9:36 AM Huadong Liu <huadong...@gmail.com> wrote:

> Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it
> is currently blocked by https://github.com/apache/iceberg/issues/2308?
>
> On Mon, May 17, 2021 at 6:17 PM OpenInx <open...@gmail.com> wrote:
>
>> Hi Huadong
>>
>> From the perspective of iceberg developers, we don't expose the format v2
>> to end users because we think there is still other work that needs to be
>> done. As you can see there are still some unfinished issues from your link.
>> As for whether v2 will cause data loss, from my perspective as a
>> designer, semantics and correctness should be handled very rigorously if we
>> don't do any compaction.  Once we introduce the compaction action,  we will
>> encounter this issue: https://github.com/apache/iceberg/issues/2308,
>> we've proposed a solution but still not reached an agreement in the
>> community.  I will suggest using v2 in production after we resolve this
>> issue at least.
>>
>> On Sat, May 15, 2021 at 8:01 AM Huadong Liu <huadong...@gmail.com> wrote:
>>
>>> Hi iceberg-dev,
>>>
>>> I tried v2 row-level deletion by committing equality delete files after
>>> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions
>>> to compact delete files and data files
>>> <https://github.com/apache/iceberg/milestone/4> etc. are in progress. I
>>> currently use the JAVA API to update, query and do maintenance ops. I am
>>> not using Flink at the moment and I will definitely pick up Spark actions
>>> when they are completed. Deletions can be scheduled in batches (e.g.
>>> weekly) to control the volume of delete files. I want to get a sense of the
>>> risk level of losing data at some point because of v2 Spec/API changes if I
>>> start to use v2 format now. It is not an easy question. Any input is
>>> appreciated.
>>>
>>> --
>>> Huadong
>>>
>>

Reply via email to