Hi all,
To minimize the recovery time from failure, we employ incremental, retained checkpoint with `state.checkpoints.num-retained as 10` in our Flink apps. With this setting, Flink automatically creates new checkpoints regularly and keeps only the latest 10 checkpoints. Besides, for app upgrade and better reliability, we have a cron job which creates savepoints at regular intervals. We have two questions for checkpoint retention. 1. When our cron job creates a savepoint called SP, it seems those checkpoints created earlier SP still cannot be deleted. We thought the new checkpoints are generated based on SP and thus old checkpoints before SP will be useless. However, it seems the checkpoint mechanism doesn't work as we thought. Is what we thought correct? 2. To save storage cost, we’d like to know what checkpoints can be deleted. Currently, each version of our app has 10 checkpoints. We wonder whether we can delete checkpoints generated for previous versions of our apps? Any comment is appreciated! Best wishes, Chen-Che An example is below. (checkpoint is generated every 30 mins while savepoint is created every 2 hours) 1:00 Flink create checkpoint 1:30 Flink create checkpoint 2:00 Flink create checkpoint 2:30 Flink create checkpoint 3:00 Cronjob create savepoint (SP) 3:30 Flink create checkpoint 4:00 Flink create checkpoint . . .