Thanks Dan for your support. There was a longer discussion around what is more important/useful:
- Some of the commenters were concerned about the resource usage and effects of the maintenance task to the Flink job checkpointing. These users prefer the `Separate Maintenance Job` solution, which is better at separating resource usage and separation of concerns. - Some of the commenters were planning to use it for well maintained tables where the expected resource usage is less of a concern. These users prefer the `Post Commit Maintenance` approach, with which they could reuse resources from the job doing the actual writes. Since both of the solutions are using the same building blocks, I have incorporated both approaches into the document, and plan to implement both of them. So I think we have a consensus here. At Flink, even if we have consensus during the discussion, it is required to start a vote. I am not sure what is the Iceberg approach here, but I think it is important to have a final validation for the framework before we start adding code. Thanks, Peter Daniel Weeks <dwe...@apache.org> ezt írta (időpont: 2024. máj. 7., K, 17:36): > +1 for supporting more maintenance support in Flink > > Peter, just wondering if there is really any known opposition/dissenting > opinions or if you're just looking for general agreement on the path > forward? > > I would also agree with the single pipeline / post commit approach as > having to configure multiple jobs or scheduling is a lot of additional > infrastructure work to set up, so single feels like it provides the most > immediate value for the larger community. > > -Dan > > On Tue, May 7, 2024 at 6:32 AM Zhu Zhu <reed...@gmail.com> wrote: > >> +1 >> >> Thanks, >> Zhu >> >> Jean-Baptiste Onofré <j...@nanthrax.net> 于2024年5月7日周二 16:17写道: >> >>> +1 >>> >>> Regards >>> JB >>> >>> On Fri, May 3, 2024 at 8:30 PM Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> > >>> > Hi everyone, >>> > >>> > I would like to make a proposal [1] to support Flink Table Maintenance >>> in Iceberg. The main goal is to have a solution where Flink can execute the >>> Maintenance Tasks as part of the streaming job. Especially Rewrite Data >>> Files, Rewrite Manifest Files and Expire Snapshots. >>> > The secondary goal is to provide building blocks for Flink batch jobs >>> to execute the Maintenance Tasks independently, where the scheduling is >>> done outside of Flink. >>> > >>> > This proposal is the outcome of extensive community discussions on the >>> mailing list [2, 3]. >>> > >>> > Please respond with your recommendation: >>> > +1 if you support moving forward with the two separate objects model. >>> > 0 if you are neutral. >>> > -1 if you disagree with the two separate objects model. >>> > >>> > Thanks, >>> > Peter >>> > >>> > [1] https://github.com/apache/iceberg/issues/10264 >>> > [2] https://lists.apache.org/thread/yjcwbf1037jdq4prty6rtrrqmjzc71o0 >>> > [3] https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl >>> >>