Thanks to Piotr driving this proposal!

Enabling unaligned checkpoint with aligned checkpoints timeout
is fine for me. I'm not sure if aligned checkpoints timeout =5s is
too aggressive. If the unaligned checkpoint is enabled by default
for all jobs, I recommend that the aligned checkpoints timeout be
at least 30s.

If the 30s is too big for some of the flink jobs, flink users can turn
it down by themselves.

To David, Ken and Zhanghao:

Unaligned checkpoint indeed has some limitations than aligned checkpoint,
but if we set aligned checkpoints timeout= 30s or 60s, it means
when a job can be completed within 30s or 60s, this job still uses the
aligned checkpoint (it doesn't introduce any extra effort).
When the checkpoint cannot be completed within aligned checkpoints timeout,
the aligned checkpoint will be switched to the unaligned checkpoint
The unaligned checkpoint can be completed when backpressure is severe.

In brief, when backpressure is low, enabling them without any effort.
when backpressure is high, enabling them has some benefits.

So I think it doesn't have too many risks when aligned checkpoints timeout
is set to 30s or above. WDYT?

Best,
Rui

On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com>
wrote:

> Hi Piotr,
>
> As a platform administer who runs kilos of Flink jobs, I'd be against the
> idea to enable unaligned cp by default for our jobs. It may help a
> significant portion of the users, but the subtle issues around unaligned CP
> for a few jobs will probably raise a lot more on-calls and incidents. From
> my point of view, we'd better not enable it by default before removing all
> the limitations listed in
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> .
>
> Best,
> Zhanghao Chen
> ________________________________
> From: Piotr Nowojski <pnowoj...@apache.org>
> Sent: Friday, January 5, 2024 21:41
> To: dev <dev@flink.apache.org>
> Subject: FLIP-413: Enable unaligned checkpoints by default
>
> Hi!
>
> I would like to propose by default to enable unaligned checkpoints and also
> simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
> think this change is the right one to do for the majority of Flink users.
>
> For more rationale please take a look into the short FLIP-413 [1].
>
> What do you all think?
>
> Best,
> Piotrek
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
>

Reply via email to