退订
At 2024-01-08 17:45:01, "Piotr Nowojski" <pnowoj...@apache.org> wrote: >Hi thanks for the responses, > >And thanks for pointing out the jobs upgrade issue. Indeed that has >slipped my mind. I was mistakenly >thinking that we are supporting all upgrades only via savepoint. Anyway, >maybe in that case we should >guide users towards that? Using savepoints for upgrades? That would be even >easier to understand >for the users: >- use unaligned checkpoints for checkpoints >- use savepoints for any changes in the job/version upgrades > >There is a downside, that savepoints are always full, while aligned >checkpoints can be incremental. > >WDYT? > >Regarding the value for the timeout, I would also be fine with 30s. Indeed >that's a safer default. > >> On a separate point, in the sentence below it seems to me it would be >> clearer to say that in the unlikely scenario you've described, the change >> would "significantly increase checkpoint sizes" -- assuming I understand >> things correctly. > >I've reworded that paragraph. > >Best, >Piotrek > > > >pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a): > >> Thanks to Piotr driving this proposal! >> >> Enabling unaligned checkpoint with aligned checkpoints timeout >> is fine for me. I'm not sure if aligned checkpoints timeout =5s is >> too aggressive. If the unaligned checkpoint is enabled by default >> for all jobs, I recommend that the aligned checkpoints timeout be >> at least 30s. >> >> If the 30s is too big for some of the flink jobs, flink users can turn >> it down by themselves. >> >> To David, Ken and Zhanghao: >> >> Unaligned checkpoint indeed has some limitations than aligned checkpoint, >> but if we set aligned checkpoints timeout= 30s or 60s, it means >> when a job can be completed within 30s or 60s, this job still uses the >> aligned checkpoint (it doesn't introduce any extra effort). >> When the checkpoint cannot be completed within aligned checkpoints timeout, >> the aligned checkpoint will be switched to the unaligned checkpoint >> The unaligned checkpoint can be completed when backpressure is severe. >> >> In brief, when backpressure is low, enabling them without any effort. >> when backpressure is high, enabling them has some benefits. >> >> So I think it doesn't have too many risks when aligned checkpoints timeout >> is set to 30s or above. WDYT? >> >> Best, >> Rui >> >> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <zhanghao.c...@outlook.com> >> wrote: >> >> > Hi Piotr, >> > >> > As a platform administer who runs kilos of Flink jobs, I'd be against the >> > idea to enable unaligned cp by default for our jobs. It may help a >> > significant portion of the users, but the subtle issues around unaligned >> CP >> > for a few jobs will probably raise a lot more on-calls and incidents. >> From >> > my point of view, we'd better not enable it by default before removing >> all >> > the limitations listed in >> > >> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations >> > . >> > >> > Best, >> > Zhanghao Chen >> > ________________________________ >> > From: Piotr Nowojski <pnowoj...@apache.org> >> > Sent: Friday, January 5, 2024 21:41 >> > To: dev <dev@flink.apache.org> >> > Subject: FLIP-413: Enable unaligned checkpoints by default >> > >> > Hi! >> > >> > I would like to propose by default to enable unaligned checkpoints and >> also >> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I >> > think this change is the right one to do for the majority of Flink users. >> > >> > For more rationale please take a look into the short FLIP-413 [1]. >> > >> > What do you all think? >> > >> > Best, >> > Piotrek >> > >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default >> > >>