Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5748 I was unsure initially, because we tried hard to avoid the checkpoint lock during shutdown/cancellation before (some timer / user thread might hold the lock)... The changes to the network stack (use fewer other locks) seem to make this one here strictly necessary though. I could not yet think of another way to do this. We also have better cancellation safety nets in place now, which should help...
---