tillrohrmann commented on pull request #15105: URL: https://github.com/apache/flink/pull/15105#issuecomment-795587910
Thanks for the review @zentol. I am not entirely sure whether I fully understand your comment. The classes `RetryingRegistration` and `RegisteredRpcConnection` try to abstract the connection logic for Flink processes. What this PR adds is a callback to react to rejected connections. The reason why I think this is necessary is that different components should react differently on a rejected connection (JM <-> TM => TM should release job resources, RM <-> TM => TM should stop itself because it cannot connect to the RM). Concerning that a TM should not be able to connect to a JM which is not responsible for it based on the `JobMasterId`: The `JobMasterId` is a concept of the `RpcSystem` whereas the `JobID` is a concept of the "application" layer. Moreover, the `JobMasterId` only works in HA setups since in non-HA setups we have no way of exchanging the leader ids. Hence, that's why this PR adds an "application" layer check for valid connections. In the case of JM <-> TM connections, this would be the `JobID`. How else would you propose to solve the problem? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org