There’s a bit of a misconception here: in Flink there is no “driver” as there 
is in spark and the entry point of your program (“main()”) is not executed on 
the cluster but in the “client”. The main method is only responsible for 
constructing a program graph, this is then shipped to the cluster and the 
client (or the “main()”) method can shut down at this point. In your concrete 
case, this means that the main() method is not executed in the YARN context, 
i.e. it does not have the files that you specified with the “—yarnship” command.

Regarding “—yarnship” in general, I have descended into the depths of the Flink 
YARN support and this is how it works:
FlinkYarnSessionCli is the piece of code that acts as entry point when 
specifying “-m yarn-cluster” at the command line. This is the place where the 
options are defined: 
https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L138-L138.
 The options are not hardcoded but have a dynamic prefix, normally the short 
prefix is “y” and the long prefix is “yarn”. In there you see

shipPath = new Option(shortPrefix + "t", longPrefix + "ship", true, "Ship files 
in the specified directory (t for transfer)”);

This translates to having the -yt and —yarnship parameters.

As to how FlinkYarnSessionCli is used when specifying “-m yarn-cluster”, this 
happens here: 
https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136
 
<https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L136-L136>.
 Essentially, a “CustomCommandLine” subclass is responsible for handling the 
user invocation and the subclasses can announce that they would like to handle 
the user command line based on certain settings. For example, 
FlinkYarnSessionCli will announce that it can handle a command line when the 
“-m yarn-cluster” option is present: 
https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493
 
<https://github.com/apache/flink/blob/f839018131024860a1b25b13cea7e1313add28d5/flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java#L493-L493>.
 The CliFrontend will loop though the list of registered CustomCommandLine 
instances and pick the first one that announces that it would like to handle a 
given invocation: 
https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174
 
<https://github.com/apache/flink/blob/4aa2ffcef8edae574ec270631841ef4a0c793dec/flink-clients/src/main/java/org/apache/flink/client/CliFrontend.java#L1174-L1174>

This is very convoluted and I hope my explications somehow help.

Best,
Aljoscha

> On 13. Jul 2017, at 18:02, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> I went back to commit 6e38eb8:
> [FLINK-1436] [docs] update command line documentation
> 
> A search in the repo for "yarnship" ended up with no hit in the code (same 
> with commit bf6b9aaab89e2e04678784525a42a19f099aa7f5 which is at top of git 
> repo).
> 
> Wondering whether it is supported.
> 
> On Thu, Jul 13, 2017 at 8:10 AM, Guy Harmach <g...@amdocs.com> wrote:
> Hi,
> 
>  
> 
> I’m running a flink job on YARN. I’d like to pass yaml configuration files to 
> the job.
> 
> I tried to use the flink cli –yarnship flag to point to a directory 
> containing the file, but wasn’t able to get it in the job.
> 
> Can someone give an example of how to send local files and how to read them 
> in the job?
> 
>  
> 
> Thanks, Guy
> 
>  
> 
> This message and the information contained herein is proprietary and 
> confidential and subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer
> 

Reply via email to