Hi Piotr,

Thanks for driving this! Generally I support enabling the alignment timeout
for aligned checkpoint. And I second Rui's opinion, 30s seems a reasonable
value.

However I'm worried if there are some operators that do not support the
unaligned CP, which may cause data accuracy problems (as one described in
the doc[1])? How about providing a mechanism for users to claim the support
for unaligned CP and check before enabling it automatically?


[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#interplay-with-watermarks

Best,
Zakelly

On Mon, Jan 8, 2024 at 3:02 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks to Piotr driving this proposal!
>
> Enabling unaligned checkpoint with aligned checkpoints timeout
> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> too aggressive. If the unaligned checkpoint is enabled by default
> for all jobs, I recommend that the aligned checkpoints timeout be
> at least 30s.
>
> If the 30s is too big for some of the flink jobs, flink users can turn
> it down by themselves.
>
> To David, Ken and Zhanghao:
>
> Unaligned checkpoint indeed has some limitations than aligned checkpoint,
> but if we set aligned checkpoints timeout= 30s or 60s, it means
> when a job can be completed within 30s or 60s, this job still uses the
> aligned checkpoint (it doesn't introduce any extra effort).
> When the checkpoint cannot be completed within aligned checkpoints timeout,
> the aligned checkpoint will be switched to the unaligned checkpoint
> The unaligned checkpoint can be completed when backpressure is severe.
>
> In brief, when backpressure is low, enabling them without any effort.
> when backpressure is high, enabling them has some benefits.
>
> So I think it doesn't have too many risks when aligned checkpoints timeout
> is set to 30s or above. WDYT?
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com>
> wrote:
>
> > Hi Piotr,
> >
> > As a platform administer who runs kilos of Flink jobs, I'd be against the
> > idea to enable unaligned cp by default for our jobs. It may help a
> > significant portion of the users, but the subtle issues around unaligned
> CP
> > for a few jobs will probably raise a lot more on-calls and incidents.
> From
> > my point of view, we'd better not enable it by default before removing
> all
> > the limitations listed in
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> > .
> >
> > Best,
> > Zhanghao Chen
> > ________________________________
> > From: Piotr Nowojski <pnowoj...@apache.org>
> > Sent: Friday, January 5, 2024 21:41
> > To: dev <dev@flink.apache.org>
> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >
> > Hi!
> >
> > I would like to propose by default to enable unaligned checkpoints and
> also
> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
> > think this change is the right one to do for the majority of Flink users.
> >
> > For more rationale please take a look into the short FLIP-413 [1].
> >
> > What do you all think?
> >
> > Best,
> > Piotrek
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >
>

Reply via email to