[jira] [Commented] (FLINK-2804) Support blocking job submission with Job Manager recovery

ASF GitHub Bot (JIRA) Tue, 06 Oct 2015 05:49:40 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944969#comment-14944969
 ]


ASF GitHub Bot commented on FLINK-2804:
---------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/1230#issuecomment-145846664
  
    For the sake of completeness I repost what I've just commented on the last 
commit:
    
    I think it would be better to move the `JobManager` retrieval logic into 
the `JobClientActor`. Then the `JobClientActor` would be responsible for 
finding the current leader and resubmit the `JobGraph`. The `Client` would 
simply wait for the `JobExecutionResult` or an `Exception`.
    
    That way, we would unify the retry logic in one place which would reduce a 
little bit of complexity.
    
    Since @uce is quite overloaded right now, I can take a shot at it. Does the 
PR depends on anything from the PRs #1227 and #1153?


> Support blocking job submission with Job Manager recovery
> ---------------------------------------------------------
>
>                 Key: FLINK-2804
>                 URL: https://issues.apache.org/jira/browse/FLINK-2804
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>            Priority: Minor
>
> Submitting a job in a blocking fashion with JobManager recovery and a failing 
> JobManager fails on the client side (the one submitting the job). The job 
> still continues to be recovered.
> I propose to add simple support to re-retrieve the leading job manager and 
> update the client actor with it and then wait for the result as before.
> As of the current standing in PR #1153 
> (https://github.com/apache/flink/pull/1153) the job manager assumes that the 
> same actor is running and just keeps on sending execution state updates etc. 
> (if the listening behaviour is not detached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2804) Support blocking job submission with Job Manager recovery

Reply via email to