Hello everybody, in the past few days me and my colleagues ran some tests with Flink on YARN and detected a possible inconsistent behavior in the way the contents of the flink/lib directory is shipped to the cluster when run on YARN, depending on the fact that the jobs are deployed individually or onto a long-running session.
After some discussion on the user mailing list <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html> we were under the impression that the contents of that folder are always supposed to be copied so that all the nodes have access to them. Furthermore, we've found a comment in the code <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/FlinkYarnClientBase.java#L254-L263> that states: // remove uberjar from ship list (by default everything in the lib/ folder is added to // the list of files to ship, but we handle the uberjar separately. However, after having a look at some portions of the code, I'm not really sure if this is actually the case or not. The Flink long-running YARN session actually ships the contents because it's specified in the yarn-session.sh script <https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/yarn-bin/yarn-session.sh#L55>, however running a single job on YARN does not automatically ship the contents of the lib folder. The behavior is not documented an I'd like to write some lines in the docs to make clear of what is shipped in which case. Also, if there is an agreement on the behavior that the single jobs on YARN should have, I can also provide a fix for it. My feeling is that running a job on YARN should end up in having more or less the same effect, regardless of the way the job is run. Let me know what you think, thank you for your attention. -- BR, Stefano Baghino Software Engineer @ Radicalbit