+1 (binding). As for David's concern of smaller buffers after recovery, I ever had a draft design [1] to solve this issue. You can take a look and leave comments if still have concerns. :)
[1] https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit Best, Zhijiang ------------------------------------------------------------------ From:Piotr Nowojski <pi...@ververica.com> Send Time:2020 Mar. 11 (Wed.) 21:19 To:dev <dev@flink.apache.org> Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints +1 (binding). Piotrek > On 11 Mar 2020, at 09:19, David Anderson <da...@ververica.com> wrote: > > +1 I like where this is headed. > > One question: during restore, it could happen that a new task manager is > configured with fewer or smaller buffers than was previously the case. How > will this be handled? > > David > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <ar...@ververica.com> wrote: > >> Hi Thomas, >> >> it's like you said. The first version will not support rescaling and mostly >> addresses the concerns about making little to no progress because of >> frequent crashes. >> >> The main reason is that we cannot guarantee the ordering of non-keyed data >> (and even keyed data in some weird cases) when rescaling currently. We have >> a general concept to address that, which would also enable dynamic >> rescaling in the future, but that would make the changes even bigger and we >> would not have any version ready for 1.11. >> >> The current plan, of course, is to continue improving unaligned checkpoints >> immediately after release, such that we have the full feature set for 1.12. >> Potentially, unaligned checkpoints (with timeouts) would even become the >> default option. >> >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <t...@apache.org> wrote: >> >>> +1 >>> >>> Thanks for putting this together, looking forward to the experimental >>> support in the next release. >>> >>> One clarification: since the MVP won't support rescaling, does it imply >>> that savepoints will always use aligned checkpointing? If so, this would >>> still block the user from taking a savepoint and resume with increased >>> parallelism to resolve a prolonged/permanent backpressure condition? >>> >>> Thanks, >>> Thomas >>> >>> >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <ar...@ververica.com> wrote: >>> >>>> Hi all, >>>> >>>> I would like to start the vote for FLIP-76 [1], which is discussed and >>>> reached a consensus in the discussion thread [2]. >>>> >>>> The vote will be open until March. 13th (72h), unless there is an >>> objection >>>> or not enough votes. >>>> >>>> Thanks, >>>> Arvid >>>> >>>> [1] >>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints >>>> [2] >>>> >>>> >>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html >>>> >>> >>