Hi Piotr, Thanks for driving this! Generally I support enabling the alignment timeout for aligned checkpoint. And I second Rui's opinion, 30s seems a reasonable value.
However I'm worried if there are some operators that do not support the unaligned CP, which may cause data accuracy problems (as one described in the doc[1])? How about providing a mechanism for users to claim the support for unaligned CP and check before enabling it automatically? [1] https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#interplay-with-watermarks Best, Zakelly On Mon, Jan 8, 2024 at 3:02 PM Rui Fan <1996fan...@gmail.com> wrote: > Thanks to Piotr driving this proposal! > > Enabling unaligned checkpoint with aligned checkpoints timeout > is fine for me. I'm not sure if aligned checkpoints timeout =5s is > too aggressive. If the unaligned checkpoint is enabled by default > for all jobs, I recommend that the aligned checkpoints timeout be > at least 30s. > > If the 30s is too big for some of the flink jobs, flink users can turn > it down by themselves. > > To David, Ken and Zhanghao: > > Unaligned checkpoint indeed has some limitations than aligned checkpoint, > but if we set aligned checkpoints timeout= 30s or 60s, it means > when a job can be completed within 30s or 60s, this job still uses the > aligned checkpoint (it doesn't introduce any extra effort). > When the checkpoint cannot be completed within aligned checkpoints timeout, > the aligned checkpoint will be switched to the unaligned checkpoint > The unaligned checkpoint can be completed when backpressure is severe. > > In brief, when backpressure is low, enabling them without any effort. > when backpressure is high, enabling them has some benefits. > > So I think it doesn't have too many risks when aligned checkpoints timeout > is set to 30s or above. WDYT? > > Best, > Rui > > On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com> > wrote: > > > Hi Piotr, > > > > As a platform administer who runs kilos of Flink jobs, I'd be against the > > idea to enable unaligned cp by default for our jobs. It may help a > > significant portion of the users, but the subtle issues around unaligned > CP > > for a few jobs will probably raise a lot more on-calls and incidents. > From > > my point of view, we'd better not enable it by default before removing > all > > the limitations listed in > > > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations > > . > > > > Best, > > Zhanghao Chen > > ________________________________ > > From: Piotr Nowojski <pnowoj...@apache.org> > > Sent: Friday, January 5, 2024 21:41 > > To: dev <dev@flink.apache.org> > > Subject: FLIP-413: Enable unaligned checkpoints by default > > > > Hi! > > > > I would like to propose by default to enable unaligned checkpoints and > also > > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I > > think this change is the right one to do for the majority of Flink users. > > > > For more rationale please take a look into the short FLIP-413 [1]. > > > > What do you all think? > > > > Best, > > Piotrek > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default > > >