[ 
https://issues.apache.org/jira/browse/FLINK-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15683082#comment-15683082
 ] 

Maximilian Michels edited comment on FLINK-5081 at 11/21/16 10:36 AM:
----------------------------------------------------------------------

I've had a second look. -The issue is not that the configuration is not loaded. 
Moreover, your finding reveals at least two other issues with our per-job YARN 
implementation:-

-1. When executing in non-detached job submission mode, the "Client Shutdown 
Hook" shuts down the Yarn application in case of job failures (e.g. TaskManager 
dies). We should remove the shutdown hook. It should only be active during 
deployment.-

-2. The per-job Yarn application is supposed to automatically shut down the 
cluster after job completion. In case of failures (e.g. TaskManager dies) the 
shutdown apparently is performed as well although it shouldn't.-

edit: 

1) is not an issue since it only shuts down when it reaches a terminal state.
2) Is an issue but unrelated to this issue

The actual issue here is that the JobManager informs the client of the failed 
job and the client shuts down the cluster. We should differentiate between 
fatal and non-fatal failures in the client.


was (Author: mxm):
I've had a second look. The issue is not that the configuration is not loaded. 
Moreover, your finding reveals at least two other issues with our per-job YARN 
implementation:

1. When executing in non-detached job submission mode, the "Client Shutdown 
Hook" shuts down the Yarn application in case of job failures (e.g. TaskManager 
dies). We should remove the shutdown hook. It should only be active during 
deployment.

2. The per-job Yarn application is supposed to automatically shut down the 
cluster after job completion. In case of failures (e.g. TaskManager dies) the 
shutdown apparently is performed as well although it shouldn't.

> unable to set yarn.maximum-failed-containers with flink one-time YARN setup
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-5081
>                 URL: https://issues.apache.org/jira/browse/FLINK-5081
>             Project: Flink
>          Issue Type: Bug
>          Components: Startup Shell Scripts
>    Affects Versions: 1.2.0, 1.1.4
>            Reporter: Nico Kruber
>            Assignee: Maximilian Michels
>             Fix For: 1.2.0, 1.1.4
>
>
> When letting flink setup YARN for a one-time job, it apparently does not 
> deliver the {{yarn.maximum-failed-containers}} parameter to YARN as the 
> {{yarn-session.sh}} script does. Adding it to conf/flink-conf.yaml as 
> https://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html#recovery-behavior-of-flink-on-yarn
>  suggested also does not work.
> example:
> {code:none}
> flink run -m yarn-cluster -yn 3 -yjm 1024 -ytm 4096 <job>.jar --parallelism 3 
> -Dyarn.maximum-failed-containers=100
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to