[ https://issues.apache.org/jira/browse/FLINK-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430870#comment-15430870 ]
ASF GitHub Bot commented on FLINK-4348: --------------------------------------- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2389 Thanks for your contribution @beyond1920 :-) I've reviewed the PR and I think it would be good if we split it up into several parts. The first part could be the heartbeat logic a.k.a. `HeartbeatManager`. Here we could try to implement a generic sending and receiving end. I think the implementation can almost be independent of the JM, RM and TE implementation (similar to `RetryingRegistration`). This will allow us to easily test this component. The next step would be the integration of this component into the RM, JM and TE. Concerning the slot request logic I think we should wait a little bit for the `SlotManager` implementation. It could be the case that the `SlotManager` will make the rpcs to the `TaskExecutor` and not the RM. But for the moment the interface is, afaik, not well enough specified to program against it. The failure notification should also be treated in a separate PR imo. The notification can have multiple origins (e.g. `HeartbeatManager` or the resource management framework) and should be designed in such a way. In general, I think the components should be more thoroughly tested with more fine-grained unit tests. Furthermore, I think it would be good if we could revise the code documentation a little bit. > Implement communication from ResourceManager to TaskManager > ----------------------------------------------------------- > > Key: FLINK-4348 > URL: https://issues.apache.org/jira/browse/FLINK-4348 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management > Reporter: Kurt Young > Assignee: zhangjing > > There are mainly 3 logics initiated from RM to TM: > * Heartbeat, RM use heartbeat to sync with TM's slot status > * SlotRequest, when RM decides to assign slot to JM, should first try to send > request to TM for slot. TM can either accept or reject this request. > * FailureNotify, in some corner cases, TM will be marked as invalid by > cluster manager master(e.g. yarn master), but TM itself does not realize. RM > should send failure notify to TM and TM can terminate itself -- This message was sent by Atlassian JIRA (v6.3.4#6332)