[jira] [Commented] (FLINK-4478) Implement heartbeat logic

ASF GitHub Bot (JIRA) Fri, 30 Sep 2016 03:20:53 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15535623#comment-15535623
 ]


ASF GitHub Bot commented on FLINK-4478:
---------------------------------------

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2435#discussion_r81309944
  
    --- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotProtocolTest.java
 ---
    @@ -43,12 +42,9 @@
     import org.junit.BeforeClass;
     import org.junit.Test;
     import org.mockito.Mockito;
    -import org.mockito.invocation.InvocationOnMock;
    -import org.mockito.stubbing.Answer;
     
     import java.util.Collections;
     import java.util.UUID;
    -import java.util.concurrent.CompletableFuture;
    --- End diff --
    
    These changes are unrelated. I'm cleaning up the imports in #2571. 


> Implement heartbeat logic
> -------------------------
>
>                 Key: FLINK-4478
>                 URL: https://issues.apache.org/jira/browse/FLINK-4478
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 1.2.0
>
>
> With the Flip-6 refactoring, we'll have the need for a dedicated heartbeat 
> component. The heartbeat component is used to check the liveliness of the 
> distributed components among each other. Furthermore, heartbeats are used to 
> regularly transmit status updates to another component. For example, the 
> TaskManager informs the ResourceManager with each heartbeat about the current 
> slot allocation.
> The heartbeat is initiated from one component. This component sends a 
> heartbeat request to another component which answers with an heartbeat 
> response. Thus, one can differentiate between a sending and a receiving side. 
> Apart from the triggering of the heartbeat request, the logic of treating 
> heartbeats, marking components dead and payload delivery are the same and 
> should be reusable by different distributed components (JM, TM, RM).
> Different models for the heartbeat reporting are conceivable. First of all, 
> the heartbeat request could be sent as an ask operation where the heartbeat 
> response is returned as a future on the sending side. Alternatively, the 
> sending side could request a heartbeat response by sending a tell message. 
> The heartbeat response is then delivered by an RPC back to the heartbeat 
> sender. The latter model has the advantage that a heartbeat response is not 
> tightly coupled to a heartbeat request. Such a tight coupling could cause 
> that heartbeat response are ignored after the future has timed out even 
> though they might still contain valuable information (receiver is still 
> alive).
> Furthermore, different strategies for the heartbeat triggering and marking 
> heartbeat targets as dead are conceivable. For example, we could periodically 
> (with a fixed period) trigger a heartbeat request and mark all targets as 
> dead if we didn't receive a heartbeat response in a given time period. 
> Furthermore, we could adapt the heartbeat interval and heartbeat timeouts 
> with respect to the latency of previous heartbeat responses. This would 
> reflect the current load and network conditions better.
> For the first version, I would propose to use a fixed period heartbeat with a 
> maximum heartbeat timeout before a target is marked dead. Furthermore, I 
> would propose to use tell messages (fire and forget) to request and report 
> heartbeats because they are the more flexible model imho.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4478) Implement heartbeat logic

Reply via email to