Hi John,
this looks indeed strange. How many concurrent operators do you have which
write state to s3?
After the cancellation, the JobManager should keep the slots for some time
until they are freed. This is the normal behaviour and can be controlled
with `slot.idle.timeout`. Could you maybe shar
This is the second of two recovery problems I'm seeing running Flink in
Kubernetes. I'm posting them in separate messages for brevity and because the
second is not directly related to the first. Any advice is appreciated. First
problem:
https://lists.apache.org/thread.html/a663a8ce2f697e6d20