[jira] [Commented] (FLINK-4657) Implement HighAvailabilityServices based on zookeeper

ASF GitHub Bot (JIRA) Tue, 27 Sep 2016 20:23:46 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528245#comment-15528245
 ]


ASF GitHub Bot commented on FLINK-4657:
---------------------------------------

Github user KurtYoung commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2550#discussion_r80837572
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java 
---
    @@ -467,6 +487,128 @@ public void registerAtResourceManager(final String 
address) {
                //TODO:: register at the RM
        }
     
    +   @RpcMethod
    +   public NextInputSplit requestNextInputSplit(final JobVertexID vertexID, 
final ExecutionAttemptID executionAttempt) {
    --- End diff --
    
    I'm not sure that throw a exception by RpcMethod is a good way to do error 
handling. From the caller's side, when the rpc method returns the Future from 
gateway, caller can do error handling with handleAsync or exceptionallyAsync 
now. But the exception from user logic with mess with all the exceptions from 
rpc framework, like RpcTimeout or other exception that tells you that maybe the 
rpc system does not work well. So typically you need to try to figure out what 
went wrong by distinguishing the Exception type, which is not very elegant i 
think. 
    One way we can do is we never throw exception in RpcMethod but deal with 
error in the "ErrorCode" way by return the error explicitly with return value. 
All the exception thrown when doing rpc call should due to the rpc framework. 
    In this situation, returning null is indicating that something wrong with 
requesting. (If we should know more detail about error, we can rich it by 
returning message, currently null will do the work)
    And in normal case like no further split, we still return a NextInputSplit 
with empty content in it. 
    What do you think about all these, let me know.


> Implement HighAvailabilityServices based on zookeeper
> -----------------------------------------------------
>
>                 Key: FLINK-4657
>                 URL: https://issues.apache.org/jira/browse/FLINK-4657
>             Project: Flink
>          Issue Type: New Feature
>          Components: Cluster Management
>            Reporter: Kurt Young
>            Assignee: Kurt Young
>
> For flip-6, we will have ResourceManager and every JobManager as potential 
> leader contender and retriever. We should separate them by using different 
> zookeeper path. 
> For example, the path could be /leader/resource-manaeger for RM. And for each 
> JM, the path could be /leader/job-managers/JobID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4657) Implement HighAvailabilityServices based on zookeeper

Reply via email to