Re: [DISCUSS] FLIP-193: Snapshots ownership

2022-03-29 Thread Dawid Wysakowicz
for the states. +1 for the overall changes since it makes the behavior clear and provide users a determined method to finally cleanup savepoints / retained checkpoints. Regarding the changes to the public interface, it seems currently the changes are all bound to the savepoint, but from the FLI

Re: [DISCUSS] FLIP-193: Snapshots ownership

2022-03-29 Thread Hangxiang Yu
pload it). > > Under the hood, it can work like this: > - for the checkpoint Flink recovers from, remember all shared state > handles it is adding > - when unregistering shared state handles, remove them from the set above > - when the set becomes empty the 1st checkpoint can be de

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Dawid Wysakowicz
far builds on the assumption we could in most cases use a cheap >> duplicate API instead of re-upload. I could see this as a follow-up if it >> becomes a bottleneck. It would be a bit invasive though, as we would have to >> somehow keep track which files should not be reused on TMs

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Till Rohrmann
er all shared state > handles it is adding > - when unregistering shared state handles, remove them from the set above > - when the set becomes empty the 1st checkpoint can be deleted externally > > Besides not requiring re-upload, it seems much simpler and less invasive. > On the downside, state deletion can be delayed; but I think this is a > reasonable trade-

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Dawid Wysakowicz
d an externalized checkpoint already. >>> I wanted to voice that concern. Nevertheless I am fine with changing it to >>> execution.restore-mode, if there are no other comments on that matter, I >>> will change it. >>> >>> @Roman: >>> >>> Re 1. Co

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Roman Khachatryan
3 Neither of the counter proposals work well for taking incremental > > savepoints. We were thinking of building incremental savepoints on the same > > concept. I think delaying the completion of an independent savepoint to a > > closer undefined future is not a nice property

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Konstantin Knauf
quiring re-upload, it seems much simpler and less invasive. > On the downside, state deletion can be delayed; but I think this is a > reasonable trade-off. > > 3. Alternatively, re-upload not necessarily on 1st checkpoint, but > after a configured number of checkpoints? > There is a high chance that after some more checkp

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-26 Thread Dawid Wysakowicz
the first completed checkpoint has the >> independent/full checkpoint property rather than just the first triggered. >> >> Re. 5 & 6 I need a bit more time to look into it. >> >> Best, >> >> Dawid >> >> On 22/11/2021 11:40, Roman Khachatrya

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-23 Thread Roman Khachatryan
here is a high chance that after some more checkpoints, initial state > will not be used (because of compaction), > so backends won't have to re-upload anything (or small part). > > 4. Re-uploaded artifacts must not be deleted on checkpoin abortion > This should be addressed in https://issues.apache.org/jira/browse/FLINK-2

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-23 Thread Dawid Wysakowicz
ather than just the first >>> triggered. >>> >>> Re. 5 & 6 I need a bit more time to look into it. >>> >>> Best, >>> >>> Dawid >>> >>> On 22/11/2021 11:40, Roman Khachatryan wrote: >>> >>> Hi, >>> >>> Thanks for the p

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-23 Thread Khachatryan Roman
> > > Besides not requiring re-upload, it seems much simpler and less invasive. > > On the downside, state deletion can be delayed; but I think this is a > > reasonable trade-off. > > > > 3. Alternatively, re-upload not necessarily on 1st checkpoint, but > > after a configured number of c

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-23 Thread Piotr Nowojski
(so we'll have to analyze "raw" file usage I think). > > 6. Enforcing re-upload by a single task and skew > If we use some greedy logic like subtask 0 always re-uploads then it > might be overloaded. > So we'll have to obtain a full list of subtasks first (then probab

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-23 Thread Dawid Wysakowicz
>>>>>> Instead, we can provide an API which tells whether the 1st checkpoint >>>>>> is still in use (and not force re-upload it). >>>>>> >>>>>> Under the hood, it can work like this: >>>>>> - for the checkpoint Flink

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Yun Tang
rily on 1st checkpoint, but > >>>> after a configured number of checkpoints? > >>>> There is a high chance that after some more checkpoints, initial state > >>>> will not be used (because of compaction), > >>>> so backends won't have to re-upload anything (or small part). > >>>> > >>>> 4. Re-uploaded ar

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
ators. >>>> Therefore, getIntersection() is irrelevant here, because operators >>>> might not be sharing any key groups. >>>> (so we'll have to analyze "raw" file usage I think). >>>> >>>> 6. Enforcing re-upload by a single task and sk

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
, which is doable but > >> not trivial (which I think supports "reverse API option"). > >> > >> 7. I think it would be helpful to list file systems / object stores > >> that support "fast" copy (ideally with latency numbers). > >> > &g

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
n, Nov 22, 2021 at 9:24 AM Yun Gao wrote: >> >> Hi, >> >> Very thanks Dawid for proposing the FLIP to clarify the ownership for the >> states. +1 for the overall changes since it makes the behavior clear and >> provide users a determined method to finally cleanup savepoints / retained >> checkpoints. >> >> Regarding the change

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
gt;> Very thanks Dawid for proposing the FLIP to clarify the ownership for the >> states. +1 for the overall changes since it makes the behavior clear and >> provide users a determined method to finally cleanup savepoints / retained >> checkpoints. >> >> Rega

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
ration > for retained checkpoints like in the cli side[1] ? If so, then might it be > better to change the option name > from `execution.savepoint.restore-mode` to something like > `execution.restore-mode`? > > Best, > Yun > > > [1] > https://nightlies.apache.org/fl

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
from `execution.savepoint.restore-mode` to something like >> `execution.restore-mode`? >> >> Best, >> Yun >> >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/#resuming-from-a-retained-checkpoint >> >&g

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
-------------------- > From:Konstantin Knauf > Send Time:2021 Nov. 19 (Fri.) 16:00 > To:dev > Subject:Re: [DISCUSS] FLIP-193: Snapshots ownership > > Hi Dawid, > > Thanks for working on this FLIP. Clarifying the differences and > guara

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Yun Gao
:00 To:dev Subject:Re: [DISCUSS] FLIP-193: Snapshots ownership Hi Dawid, Thanks for working on this FLIP. Clarifying the differences and guarantees around savepoints and checkpoints will make it easier and safer for users and downstream projects and platforms to work with them. +1 to the changing

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-18 Thread Konstantin Knauf
Hi Dawid, Thanks for working on this FLIP. Clarifying the differences and guarantees around savepoints and checkpoints will make it easier and safer for users and downstream projects and platforms to work with them. +1 to the changing the current (undefined) behavior when recovering from retained

[DISCUSS] FLIP-193: Snapshots ownership

2021-11-18 Thread Dawid Wysakowicz
Hi devs, I'd like to bring up for a discussion a proposal to clean up ownership of snapshots, both checkpoints and savepoints. The goal here is to make it clear who is responsible for deleting checkpoints/savepoints files and when can that be done in a safe manner. Looking forward for your feedb