Github user sihuazhou commented on the issue: https://github.com/apache/flink/pull/5881 @GJL I also noticed that this PR can only solve part of the problem...it can only make sure that the `TM` is registered with ResourceManager properly, but it can't make sure that the `TM` could connection with JobManager properly... Is it possible that the problem you met is that the `TM` was killed before connecting to `JM` successfully, that way `ResourceManager `can't be notified to trigger a new container request and the `JM` can't be notified either...What do you think?
---