[ https://issues.apache.org/jira/browse/FLINK-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944969#comment-14944969 ]
ASF GitHub Bot commented on FLINK-2804: --------------------------------------- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1230#issuecomment-145846664 For the sake of completeness I repost what I've just commented on the last commit: I think it would be better to move the `JobManager` retrieval logic into the `JobClientActor`. Then the `JobClientActor` would be responsible for finding the current leader and resubmit the `JobGraph`. The `Client` would simply wait for the `JobExecutionResult` or an `Exception`. That way, we would unify the retry logic in one place which would reduce a little bit of complexity. Since @uce is quite overloaded right now, I can take a shot at it. Does the PR depends on anything from the PRs #1227 and #1153? > Support blocking job submission with Job Manager recovery > --------------------------------------------------------- > > Key: FLINK-2804 > URL: https://issues.apache.org/jira/browse/FLINK-2804 > Project: Flink > Issue Type: Improvement > Affects Versions: 1.0 > Reporter: Ufuk Celebi > Assignee: Ufuk Celebi > Priority: Minor > > Submitting a job in a blocking fashion with JobManager recovery and a failing > JobManager fails on the client side (the one submitting the job). The job > still continues to be recovered. > I propose to add simple support to re-retrieve the leading job manager and > update the client actor with it and then wait for the result as before. > As of the current standing in PR #1153 > (https://github.com/apache/flink/pull/1153) the job manager assumes that the > same actor is running and just keeps on sending execution state updates etc. > (if the listening behaviour is not detached). -- This message was sent by Atlassian JIRA (v6.3.4#6332)