Hello, At work, we are currently standing up a cluster with the following configuration:
- Flink version: 1.4.2 - HA Enabled with Zookeeper - State backend : rocksDB - state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints - state.backend.rocksdb.checkpointdir: hdfs://namenode:9000/flink/checkpoints - *high-availability.storageDir*: hdfs://namenode:9000/flink/recovery We have also a job running with checkpointing enabled and without externalized checkpoint. We run this job multiple times a day since it's run from our integration-test pipeline, and we started noticing the folder *high-availability.storageDir *storing the completedCheckpoint files is increasing constantly the number of files created, which is making us wonder if there is no cleanup policy for the Filesystem when HA is enabled. Under what circumstance would there be an ever increasing number of completedCheckpoint files on the HA storage dir when there is only a single job running over and over again ? Here is a list of what we are seeing accumulating over time and actually reaching the maximum of files allowed on the Filesystem. completedCheckpoint00d86c01d8b9 completedCheckpoint00d86e9030a9 completedCheckpoint00d877b74355 completedCheckpoint00d87b3dd9ad completedCheckpoint00d8815d9afd completedCheckpoint00d88973195c completedCheckpoint00d88b4792f2 completedCheckpoint00d890d499dc completedCheckpoint00d91b00ada2 Cheers, Laura U.