Re: [Proposal] Add support for Flink Maintenance in Iceberg

Péter Váry Tue, 07 May 2024 09:13:32 -0700

Thanks Dan for your support.

There was a longer discussion around what is more important/useful:


   - Some of the commenters were concerned about the resource usage and
   effects of the maintenance task to the Flink job checkpointing. These users
   prefer the `Separate Maintenance Job` solution, which is better at
   separating resource usage and separation of concerns.
   - Some of the commenters were planning to use it for well maintained
   tables where the expected resource usage is less of a concern. These users
   prefer the `Post Commit Maintenance` approach, with which they could reuse
   resources from the job doing the actual writes.

Since both of the solutions are using the same building blocks, I have
incorporated both approaches into the document, and plan to implement both
of them.
So I think we have a consensus here.

At Flink, even if we have consensus during the discussion, it is required
to start a vote. I am not sure what is the Iceberg approach here, but I
think it is important to have a final validation for the framework before
we start adding code.

Thanks,
Peter

Daniel Weeks <dwe...@apache.org> ezt írta (időpont: 2024. máj. 7., K,
17:36):

> +1 for supporting more maintenance support in Flink
>
> Peter, just wondering if there is really any known opposition/dissenting
> opinions or if you're just looking for general agreement on the path
> forward?
>
> I would also agree with the single pipeline / post commit approach as
> having to configure multiple jobs or scheduling is a lot of additional
> infrastructure work to set up, so single feels like it provides the most
> immediate value for the larger community.
>
> -Dan
>
> On Tue, May 7, 2024 at 6:32 AM Zhu Zhu <reed...@gmail.com> wrote:
>
>> +1
>>
>> Thanks,
>> Zhu
>>
>> Jean-Baptiste Onofré <j...@nanthrax.net> 于2024年5月7日周二 16:17写道：
>>
>>> +1
>>>
>>> Regards
>>> JB
>>>
>>> On Fri, May 3, 2024 at 8:30 PM Péter Váry <peter.vary.apa...@gmail.com>
>>> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > I would like to make a proposal [1] to support Flink Table Maintenance
>>> in Iceberg. The main goal is to have a solution where Flink can execute the
>>> Maintenance Tasks as part of the streaming job. Especially Rewrite Data
>>> Files, Rewrite Manifest Files and Expire Snapshots.
>>> > The secondary goal is to provide building blocks for Flink batch jobs
>>> to execute the Maintenance Tasks independently, where the scheduling is
>>> done outside of Flink.
>>> >
>>> > This proposal is the outcome of extensive community discussions on the
>>> mailing list [2, 3].
>>> >
>>> > Please respond with your recommendation:
>>> > +1 if you support moving forward with the two separate objects model.
>>> > 0 if you are neutral.
>>> > -1 if you disagree with the two separate objects model.
>>> >
>>> > Thanks,
>>> > Peter
>>> >
>>> > [1] https://github.com/apache/iceberg/issues/10264
>>> > [2] https://lists.apache.org/thread/yjcwbf1037jdq4prty6rtrrqmjzc71o0
>>> > [3] https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl
>>>
>>

Re: [Proposal] Add support for Flink Maintenance in Iceberg

Reply via email to