Thanks to Piotr driving this proposal! Enabling unaligned checkpoint with aligned checkpoints timeout is fine for me. I'm not sure if aligned checkpoints timeout =5s is too aggressive. If the unaligned checkpoint is enabled by default for all jobs, I recommend that the aligned checkpoints timeout be at least 30s.
If the 30s is too big for some of the flink jobs, flink users can turn it down by themselves. To David, Ken and Zhanghao: Unaligned checkpoint indeed has some limitations than aligned checkpoint, but if we set aligned checkpoints timeout= 30s or 60s, it means when a job can be completed within 30s or 60s, this job still uses the aligned checkpoint (it doesn't introduce any extra effort). When the checkpoint cannot be completed within aligned checkpoints timeout, the aligned checkpoint will be switched to the unaligned checkpoint The unaligned checkpoint can be completed when backpressure is severe. In brief, when backpressure is low, enabling them without any effort. when backpressure is high, enabling them has some benefits. So I think it doesn't have too many risks when aligned checkpoints timeout is set to 30s or above. WDYT? Best, Rui On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com> wrote: > Hi Piotr, > > As a platform administer who runs kilos of Flink jobs, I'd be against the > idea to enable unaligned cp by default for our jobs. It may help a > significant portion of the users, but the subtle issues around unaligned CP > for a few jobs will probably raise a lot more on-calls and incidents. From > my point of view, we'd better not enable it by default before removing all > the limitations listed in > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations > . > > Best, > Zhanghao Chen > ________________________________ > From: Piotr Nowojski <pnowoj...@apache.org> > Sent: Friday, January 5, 2024 21:41 > To: dev <dev@flink.apache.org> > Subject: FLIP-413: Enable unaligned checkpoints by default > > Hi! > > I would like to propose by default to enable unaligned checkpoints and also > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I > think this change is the right one to do for the majority of Flink users. > > For more rationale please take a look into the short FLIP-413 [1]. > > What do you all think? > > Best, > Piotrek > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default >