tillrohrmann commented on pull request #15105:
URL: https://github.com/apache/flink/pull/15105#issuecomment-795587910


   Thanks for the review @zentol. I am not entirely sure whether I fully 
understand your comment. The classes `RetryingRegistration` and 
`RegisteredRpcConnection` try to abstract the connection logic for Flink 
processes. What this PR adds is a callback to react to rejected connections. 
The reason why I think this is necessary is that different components should 
react differently on a rejected connection (JM <-> TM => TM should release job 
resources, RM <-> TM => TM should stop itself because it cannot connect to the 
RM).
   
   Concerning that a TM should not be able to connect to a JM which is not 
responsible for it based on the `JobMasterId`: The `JobMasterId` is a concept 
of the `RpcSystem` whereas the `JobID` is a concept of the "application" layer. 
Moreover, the `JobMasterId` only works in HA setups since in non-HA setups we 
have no way of exchanging the leader ids. Hence, that's why this PR adds an 
"application" layer check for valid connections. In the case of JM <-> TM 
connections, this would be the `JobID`.
   
   How else would you propose to solve the problem?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to