I'm running Flink in Application Mode and set jobId explicitly ________________________________ From: Khachatryan Roman <khachatryan.ro...@gmail.com> Sent: Monday, August 30, 2021 7:16 AM To: Alexey Trenikhun <yen...@msn.com> Cc: Matthias Pohl <matth...@ververica.com>; Flink User Mail List <user@flink.apache.org>; sjwies...@gmail.com <sjwies...@gmail.com> Subject: Re: checkpoints/.../shared cleanup
Hi, I think the documentation is correct. Once the job is stopped with savepoint, any of its "regular" checkpoints are discarded, and as a result any shared state gets unreferenced and is also discarded. Savepoints currently do not have shared state. Furthermore, the new job should have a new ID and therefore a new folder. Are you referring to the old folders? However, the removal process is asynchronous and the client doesn't wait for all the artifacts to be removed. Then the cluster will wait for removal to complete before termination. Are you running Flink in session mode? Regards, Roman On Fri, Aug 27, 2021 at 8:05 AM Alexey Trenikhun <yen...@msn.com> wrote: > > "the shared subfolder still grows" - while upgrading job, we cancel job with > savepoint, my expectations that Flink will clean checkpoint including shared > directory, since checkpoints are not reatained, then we start upgraded job > from savepoint, however when I look into shared folder I see older files from > previous version of job. This upgrade process repeated again, as result the > shared subfolder grows and grows > > Thanks, > Alexey > ________________________________ > From: Alexey Trenikhun <yen...@msn.com> > Sent: Thursday, August 26, 2021 6:37:27 PM > To: Matthias Pohl <matth...@ververica.com> > Cc: Flink User Mail List <user@flink.apache.org>; sjwies...@gmail.com > <sjwies...@gmail.com> > Subject: Re: checkpoints/.../shared cleanup > > Hi Matthias, > > I don't use externalized checkpoints (from Flink UI Persist Checkpoints > Externally: Disabled), why do you think checkpoint(s) should be retained? It > kind of contradicts with documentation [1] - Checkpoints are by default not > retained and are only used to resume a job from failures. > > [1] - > https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/checkpoints/#retained-checkpoints > Checkpoints | Apache Flink > Checkpoints # Overview # Checkpoints make state in Flink fault tolerant by > allowing state and the corresponding stream positions to be recovered, > thereby giving the application the same semantics as a failure-free > execution. See Checkpointing for how to enable and configure checkpoints for > your program. Checkpoint Storage # When checkpointing is enabled, managed > state is persisted to ensure ... > ci.apache.org > > Thanks, > Alexey > ________________________________ > From: Matthias Pohl <matth...@ververica.com> > Sent: Thursday, August 26, 2021 5:42 AM > To: Alexey Trenikhun <yen...@msn.com> > Cc: Flink User Mail List <user@flink.apache.org>; sjwies...@gmail.com > <sjwies...@gmail.com> > Subject: Re: checkpoints/.../shared cleanup > > Hi Alexey, > thanks for reaching out to the community. I have a question: What do you mean > by "the shared subfolder still grows"? As far as I understand, the shared > folder contains the state of incremental checkpoints. If you cancel the > corresponding job and start a new job from one of the retained incremental > checkpoints, it is required for the shared folder of the previous job to be > still around since it contains the state. The new job would then create its > own shared subfolder. Any new incremental checkpoints will write their state > into the new job's shared subfolder while still relying on shared state of > the previous job for older data. The RocksDB Backend is in charge of > consolidating the incremental state. > > Hence, you should be careful with removing the shared folder in case you're > planning to restart the job later on. > > I'm adding Seth to this thread. He might have more insights and/or correct my > limited knowledge of the incremental checkpoint process. > > Best, > Matthias > > On Wed, Aug 25, 2021 at 1:39 AM Alexey Trenikhun <yen...@msn.com> wrote: > > Hello, > I use incremental checkpoints, not externalized, should content of > checkpoint/.../shared be removed when I cancel job (or cancel with > savepoint). Looks like in our case shared continutes to grow... > > Thanks, > Alexey