Hi thanks for the responses,

And thanks for pointing out the jobs upgrade issue. Indeed that has
slipped my mind. I was mistakenly
thinking that we are supporting all upgrades only via savepoint. Anyway,
maybe in that case we should
guide users towards that? Using savepoints for upgrades? That would be even
easier to understand
for the users:
- use unaligned checkpoints for checkpoints
- use savepoints for any changes in the job/version upgrades

There is a downside, that savepoints are always full, while aligned
checkpoints can be incremental.

WDYT?

Regarding the value for the timeout, I would also be fine with 30s. Indeed
that's a safer default.

> On a separate point, in the sentence below it seems to me it would be
> clearer to say that in the unlikely scenario you've described, the change
> would "significantly increase checkpoint sizes" -- assuming I understand
> things correctly.

I've reworded that paragraph.

Best,
Piotrek



pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a):

> Thanks to Piotr driving this proposal!
>
> Enabling unaligned checkpoint with aligned checkpoints timeout
> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> too aggressive. If the unaligned checkpoint is enabled by default
> for all jobs, I recommend that the aligned checkpoints timeout be
> at least 30s.
>
> If the 30s is too big for some of the flink jobs, flink users can turn
> it down by themselves.
>
> To David, Ken and Zhanghao:
>
> Unaligned checkpoint indeed has some limitations than aligned checkpoint,
> but if we set aligned checkpoints timeout= 30s or 60s, it means
> when a job can be completed within 30s or 60s, this job still uses the
> aligned checkpoint (it doesn't introduce any extra effort).
> When the checkpoint cannot be completed within aligned checkpoints timeout,
> the aligned checkpoint will be switched to the unaligned checkpoint
> The unaligned checkpoint can be completed when backpressure is severe.
>
> In brief, when backpressure is low, enabling them without any effort.
> when backpressure is high, enabling them has some benefits.
>
> So I think it doesn't have too many risks when aligned checkpoints timeout
> is set to 30s or above. WDYT?
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com>
> wrote:
>
> > Hi Piotr,
> >
> > As a platform administer who runs kilos of Flink jobs, I'd be against the
> > idea to enable unaligned cp by default for our jobs. It may help a
> > significant portion of the users, but the subtle issues around unaligned
> CP
> > for a few jobs will probably raise a lot more on-calls and incidents.
> From
> > my point of view, we'd better not enable it by default before removing
> all
> > the limitations listed in
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> > .
> >
> > Best,
> > Zhanghao Chen
> > ________________________________
> > From: Piotr Nowojski <pnowoj...@apache.org>
> > Sent: Friday, January 5, 2024 21:41
> > To: dev <dev@flink.apache.org>
> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >
> > Hi!
> >
> > I would like to propose by default to enable unaligned checkpoints and
> also
> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
> > think this change is the right one to do for the majority of Flink users.
> >
> > For more rationale please take a look into the short FLIP-413 [1].
> >
> > What do you all think?
> >
> > Best,
> > Piotrek
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >
>

Reply via email to