Yes I could understand restoring a savepoint to a datastream.
What I couldn't figure out is to create a NewSavepoint for a datastream.
What I understand is that NewSavepoints only take in Bootstrap
transformation for Dataset Transform functions.


About the checkpoints, does
 CheckpointConfig.ExternalizedCheckpointCleanup = RETAIN_ON_CANCELLATION
offer the same behaviour when the job is "FINISHED" and not "CANCELLED" ?

What I'm looking for is a way to retain the state for a bounded job so that
the state is reloaded on the next job run (through api).

On Wed, 7 Jul 2021 at 14:18, Arvid Heise <ar...@apache.org> wrote:

> Hi Rakshit,
>
> The example is valid. The state processor API is kinda working like a
> DataSet application but the state is meant to be read in DataStream. Please
> check out the SavepointWriterITCase [1] for a full example. There is no
> checkpoint/savepoint in DataSet applications.
>
> Checkpoints can be stored on different checkpoint storages, such as S3 or
> HDFS. If you use RocksDB state backend, Flink pretty much just copy the SST
> files of RocksDB to S3. Checkpoints are usually bound to the life of an
> application. So they are created by the application and deleted on
> termination.
> However, you can resume an application both from savepoint and
> checkpoints. Checkpoints can be retained [2] to avoid them being deleted by
> the application during termination. But that's considered an advanced
> feature and you should first try it with savepoints.
>
> [1]
> https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints
>
> On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <
> rakshit.ram...@datakaveri.org> wrote:
>
>> I'm trying to bootstrap state into a KeyedProcessFunction equivalent that
>> takes in
>> a DataStream but I'm unable to find a reference for the same.
>> I found this gist
>> https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf
>> But it seems to only apply for DataSet.
>> My usecase is to manually trigger a Savepoint into s3 for later reuse.
>> I'm also guessing that checkpoints can't be stored in rocksdb or s3 for
>> later reuse.
>>
>

Reply via email to