I just commented on #2303. I think we should get that fixed fairly soon --
at least an interim fix to ensure that compaction correctly catches the
problem and fails. The plan for the long-term fix looks good to me as well.

On Mon, May 17, 2021 at 7:17 PM OpenInx <open...@gmail.com> wrote:

> The PR-2303 defines how the batch job does the compaction work,  the
> PR-2308  decides what's the behavior that compaction txn and  row-delta txn
> commit at the same time.   They should n't block each other,  but we will
> need to resolve both of them.
>
> On Tue, May 18, 2021 at 9:36 AM Huadong Liu <huadong...@gmail.com> wrote:
>
>> Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it
>> is currently blocked by https://github.com/apache/iceberg/issues/2308?
>>
>> On Mon, May 17, 2021 at 6:17 PM OpenInx <open...@gmail.com> wrote:
>>
>>> Hi Huadong
>>>
>>> From the perspective of iceberg developers, we don't expose the format
>>> v2 to end users because we think there is still other work that needs to be
>>> done. As you can see there are still some unfinished issues from your link.
>>> As for whether v2 will cause data loss, from my perspective as a
>>> designer, semantics and correctness should be handled very rigorously if we
>>> don't do any compaction.  Once we introduce the compaction action,  we will
>>> encounter this issue: https://github.com/apache/iceberg/issues/2308,
>>> we've proposed a solution but still not reached an agreement in the
>>> community.  I will suggest using v2 in production after we resolve this
>>> issue at least.
>>>
>>> On Sat, May 15, 2021 at 8:01 AM Huadong Liu <huadong...@gmail.com>
>>> wrote:
>>>
>>>> Hi iceberg-dev,
>>>>
>>>> I tried v2 row-level deletion by committing equality delete files after
>>>> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions
>>>> to compact delete files and data files
>>>> <https://github.com/apache/iceberg/milestone/4> etc. are in progress.
>>>> I currently use the JAVA API to update, query and do maintenance ops. I am
>>>> not using Flink at the moment and I will definitely pick up Spark actions
>>>> when they are completed. Deletions can be scheduled in batches (e.g.
>>>> weekly) to control the volume of delete files. I want to get a sense of the
>>>> risk level of losing data at some point because of v2 Spec/API changes if I
>>>> start to use v2 format now. It is not an easy question. Any input is
>>>> appreciated.
>>>>
>>>> --
>>>> Huadong
>>>>
>>>

-- 
Ryan Blue

Reply via email to