One final thought: How to you stop the unbounded streaming application?
If you just kill the Yarn/Mesos/K8s cluster, Flink will not know that this
is a shutdown, and interpret it as a failure. Because of that, checkpoints
will remain (in DFS and in ZooKeeper).
On Fri, Aug 31, 2018 at 2:18 PM, vin
Hi Laura:
Perhaps this is possible because the path to the completed checkpoint on
HDFS does not have a hierarchical relationship to identify which job it
belongs to, it is just a fixed prefix plus a random string generated name.
My personal advice:
1) Verify it with a clean cluster (clean up the
Hi Stephan and Vino,
Thanks for the quick reply and hints.
The configuration for the checkpoints that should remain is set to 1.
Since this is a unbounded job run and I can't see it finishing, I suspect
as we tear down the cluster every time we finish with the integration test
being run, the com
Hi Laura!
Vino had good pointers. There really should be no case in which this is not
cleaned up.
Is this a bounded job that ends? Is it always the last of the bounded job's
checkpoints that remains?
Best,
Stephan
On Fri, Aug 31, 2018 at 5:02 AM, vino yang wrote:
> Hi Laura,
>
> First of all
Hi Laura,
First of all, Flink only keeps one completed checkpoint by default[1]. You
need to confirm whether your configuration is consistent with the number of
files. If they are consistent, it is for other reasons:
1) The cleaning of the completed checkpoint is done by JM. You need to
confirm w
Hello,
At work, we are currently standing up a cluster with the following
configuration:
- Flink version: 1.4.2
- HA Enabled with Zookeeper
- State backend : rocksDB
- state.checkpoints.dir: hdfs://namenode:9000/flink/checkpoints
- state.backend.rocksdb.checkpointdir:
hdfs://n