Hi Xintong Song,

> - Does this error happen for every of your dataset jobs? For a problematic 
> job, does it happen for every container?
> - What is the `jobs.jar`? Is it under `lib/`, `opt` of your client side 
> filesystem, or specified as `yarn.ship-files`, `yarn.ship-archives` or 
> `yarn.provided.lib.dirs`? This helps us to locate the code path that this 
> file went through.

I finally found the cause for the problem - I set both yarn.flink-dist-jar and 
pipeline.jars to the same archive (I submit jobs programmatically and repackage 
the Flink distribution because flink-dist.jar is not in the Central).
If I copy the file and refer jobs and distribution jars under different names 
the problem disappears.

My guess is that YARN (YarnApplicationFileUploader?) copies both files and if 
the filenames are the same the first file is overwritten by the second one and 
thus there is a a timestamp difference.

I guess a lot has changed since 1.8 in the YARN deployment area. Too bad there 
is no clear instruction how to submit a job programmatically every time I have 
to reverse engineer CliFrontend.

Sorry for the confusion and thanks!

Mark

Reply via email to