Hi Xintong Song, > - Does this error happen for every of your dataset jobs? For a problematic > job, does it happen for every container? > - What is the `jobs.jar`? Is it under `lib/`, `opt` of your client side > filesystem, or specified as `yarn.ship-files`, `yarn.ship-archives` or > `yarn.provided.lib.dirs`? This helps us to locate the code path that this > file went through.
I finally found the cause for the problem - I set both yarn.flink-dist-jar and pipeline.jars to the same archive (I submit jobs programmatically and repackage the Flink distribution because flink-dist.jar is not in the Central). If I copy the file and refer jobs and distribution jars under different names the problem disappears. My guess is that YARN (YarnApplicationFileUploader?) copies both files and if the filenames are the same the first file is overwritten by the second one and thus there is a a timestamp difference. I guess a lot has changed since 1.8 in the YARN deployment area. Too bad there is no clear instruction how to submit a job programmatically every time I have to reverse engineer CliFrontend. Sorry for the confusion and thanks! Mark