CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

Laura Uzcátegui Thu, 30 Aug 2018 07:53:11 -0700

Hello,

 At work, we are currently standing up a cluster with the following
configuration:



   - Flink version: 1.4.2
   - HA Enabled with Zookeeper
   - State backend : rocksDB
   - state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints
   - state.backend.rocksdb.checkpointdir:
   hdfs://namenode:9000/flink/checkpoints
   - *high-availability.storageDir*: hdfs://namenode:9000/flink/recovery

We have also a job running with checkpointing enabled and without
externalized checkpoint.

We run this job multiple times a day since it's run from our
integration-test pipeline, and we started noticing the folder
*high-availability.storageDir  *storing the completedCheckpoint files is
increasing constantly the number of files created, which is making us
wonder if there is no cleanup policy for the Filesystem when HA is enabled.

Under what  circumstance would there be an ever increasing number of
completedCheckpoint files on the HA storage dir when there is only a single
job running over and over again ?

Here is a list of what we are seeing accumulating over time and actually
reaching the maximum of files allowed on the Filesystem.

completedCheckpoint00d86c01d8b9
completedCheckpoint00d86e9030a9
completedCheckpoint00d877b74355
completedCheckpoint00d87b3dd9ad
completedCheckpoint00d8815d9afd
completedCheckpoint00d88973195c
completedCheckpoint00d88b4792f2
completedCheckpoint00d890d499dc
completedCheckpoint00d91b00ada2


Cheers,


Laura U.

CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

Reply via email to