Re: Recovery problem 2 of 2 in Flink 1.6.3

2019-01-15 Thread Till Rohrmann
Hi John, this looks indeed strange. How many concurrent operators do you have which write state to s3? After the cancellation, the JobManager should keep the slots for some time until they are freed. This is the normal behaviour and can be controlled with `slot.idle.timeout`. Could you maybe shar

Recovery problem 2 of 2 in Flink 1.6.3

2019-01-10 Thread John Stone
This is the second of two recovery problems I'm seeing running Flink in Kubernetes. I'm posting them in separate messages for brevity and because the second is not directly related to the first. Any advice is appreciated. First problem: https://lists.apache.org/thread.html/a663a8ce2f697e6d20