Re: Savepoints with bootstraping a datastream function

Arvid Heise Wed, 07 Jul 2021 01:48:31 -0700

Hi Rakshit,

The example is valid. The state processor API is kinda working like a
DataSet application but the state is meant to be read in DataStream. Please
check out the SavepointWriterITCase [1] for a full example. There is no
checkpoint/savepoint in DataSet applications.

Checkpoints can be stored on different checkpoint storages, such as S3 or
HDFS. If you use RocksDB state backend, Flink pretty much just copy the SST
files of RocksDB to S3. Checkpoints are usually bound to the life of an
application. So they are created by the application and deleted on
termination.
However, you can resume an application both from savepoint and checkpoints.
Checkpoints can be retained [2] to avoid them being deleted by the
application during termination. But that's considered an advanced feature
and you should first try it with savepoints.

[1]
https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints

On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <rakshit.ram...@datakaveri.org>
wrote:

> I'm trying to bootstrap state into a KeyedProcessFunction equivalent that
> takes in
> a DataStream but I'm unable to find a reference for the same.
> I found this gist
> https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf
> But it seems to only apply for DataSet.
> My usecase is to manually trigger a Savepoint into s3 for later reuse.
> I'm also guessing that checkpoints can't be stored in rocksdb or s3 for
> later reuse.
>

Re: Savepoints with bootstraping a datastream function

Reply via email to