Hi Rakshit, The example is valid. The state processor API is kinda working like a DataSet application but the state is meant to be read in DataStream. Please check out the SavepointWriterITCase [1] for a full example. There is no checkpoint/savepoint in DataSet applications.
Checkpoints can be stored on different checkpoint storages, such as S3 or HDFS. If you use RocksDB state backend, Flink pretty much just copy the SST files of RocksDB to S3. Checkpoints are usually bound to the life of an application. So they are created by the application and deleted on termination. However, you can resume an application both from savepoint and checkpoints. Checkpoints can be retained [2] to avoid them being deleted by the application during termination. But that's considered an advanced feature and you should first try it with savepoints. [1] https://github.com/apache/flink/blob/release-1.13.0/flink-libraries/flink-state-processing-api/src/test/java/org/apache/flink/state/api/SavepointWriterITCase.java#L141-L141 [2] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#retained-checkpoints On Mon, Jul 5, 2021 at 5:56 PM Rakshit Ramesh <rakshit.ram...@datakaveri.org> wrote: > I'm trying to bootstrap state into a KeyedProcessFunction equivalent that > takes in > a DataStream but I'm unable to find a reference for the same. > I found this gist > https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf > But it seems to only apply for DataSet. > My usecase is to manually trigger a Savepoint into s3 for later reuse. > I'm also guessing that checkpoints can't be stored in rocksdb or s3 for > later reuse. >