Difficult to debug reason for checkpoint decline

2019-10-07 Thread Daniel Harper
We had an issue recently where no checkpoints were able to complete, with the following message in the job manager logs 2019-09-25 12:27:57,159 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Decline checkpoint 7041 by task 1f789ac3c5df655fe5482932b2255fd3 of job 214ccf9a

Checkpoint size growing over time

2019-09-05 Thread Daniel Harper
Hi there, We are running a streaming application on Flink 1.5.2 with BEAM 2.7.0. We’ve noticed that the checkpoint size appears to be increasing at a slow, gradual rate (see screenshot) over the course of many months and are not certain as to why this is happening. We take a checkpoint every

Job unable to stabilise after restart

2018-11-19 Thread Daniel Harper
Hi there, I’ve raised this issue https://issues.apache.org/jira/browse/FLINK-10928 I recognise it’s a bit vague and there is a lot of information on that ticket, but we’re having a lot of trouble getting to the root cause in our setup. Can anyone help/point us in the right direction? :)

1.4.3 release/roadmap

2018-04-19 Thread Daniel Harper
Hi there, There are some bug fixes that are in the 1.4 branch that we would like to be made available for us to use. Is there a roadmap from the project when the next stable 1.4.x release will be cut? Any blockers?