Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-09-03 Thread Stephan Ewen
One final thought: How to you stop the unbounded streaming application? If you just kill the Yarn/Mesos/K8s cluster, Flink will not know that this is a shutdown, and interpret it as a failure. Because of that, checkpoints will remain (in DFS and in ZooKeeper). On Fri, Aug 31, 2018 at 2:18 PM, vin

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread vino yang
Hi Laura: Perhaps this is possible because the path to the completed checkpoint on HDFS does not have a hierarchical relationship to identify which job it belongs to, it is just a fixed prefix plus a random string generated name. My personal advice: 1) Verify it with a clean cluster (clean up the

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread Laura Uzcátegui
Hi Stephan and Vino, Thanks for the quick reply and hints. The configuration for the checkpoints that should remain is set to 1. Since this is a unbounded job run and I can't see it finishing, I suspect as we tear down the cluster every time we finish with the integration test being run, the com

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread Stephan Ewen
Hi Laura! Vino had good pointers. There really should be no case in which this is not cleaned up. Is this a bounded job that ends? Is it always the last of the bounded job's checkpoints that remains? Best, Stephan On Fri, Aug 31, 2018 at 5:02 AM, vino yang wrote: > Hi Laura, > > First of all

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-30 Thread vino yang
Hi Laura, First of all, Flink only keeps one completed checkpoint by default[1]. You need to confirm whether your configuration is consistent with the number of files. If they are consistent, it is for other reasons: 1) The cleaning of the completed checkpoint is done by JM. You need to confirm w

CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-30 Thread Laura Uzcátegui
Hello, At work, we are currently standing up a cluster with the following configuration: - Flink version: 1.4.2 - HA Enabled with Zookeeper - State backend : rocksDB - state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints - state.backend.rocksdb.checkpointdir: hdfs://n