[ https://issues.apache.org/jira/browse/FLINK-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528245#comment-15528245 ]
ASF GitHub Bot commented on FLINK-4657: --------------------------------------- Github user KurtYoung commented on a diff in the pull request: https://github.com/apache/flink/pull/2550#discussion_r80837572 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java --- @@ -467,6 +487,128 @@ public void registerAtResourceManager(final String address) { //TODO:: register at the RM } + @RpcMethod + public NextInputSplit requestNextInputSplit(final JobVertexID vertexID, final ExecutionAttemptID executionAttempt) { --- End diff -- I'm not sure that throw a exception by RpcMethod is a good way to do error handling. From the caller's side, when the rpc method returns the Future from gateway, caller can do error handling with handleAsync or exceptionallyAsync now. But the exception from user logic with mess with all the exceptions from rpc framework, like RpcTimeout or other exception that tells you that maybe the rpc system does not work well. So typically you need to try to figure out what went wrong by distinguishing the Exception type, which is not very elegant i think. One way we can do is we never throw exception in RpcMethod but deal with error in the "ErrorCode" way by return the error explicitly with return value. All the exception thrown when doing rpc call should due to the rpc framework. In this situation, returning null is indicating that something wrong with requesting. (If we should know more detail about error, we can rich it by returning message, currently null will do the work) And in normal case like no further split, we still return a NextInputSplit with empty content in it. What do you think about all these, let me know. > Implement HighAvailabilityServices based on zookeeper > ----------------------------------------------------- > > Key: FLINK-4657 > URL: https://issues.apache.org/jira/browse/FLINK-4657 > Project: Flink > Issue Type: New Feature > Components: Cluster Management > Reporter: Kurt Young > Assignee: Kurt Young > > For flip-6, we will have ResourceManager and every JobManager as potential > leader contender and retriever. We should separate them by using different > zookeeper path. > For example, the path could be /leader/resource-manaeger for RM. And for each > JM, the path could be /leader/job-managers/JobID -- This message was sent by Atlassian JIRA (v6.3.4#6332)