Hello everybody,

in the past few days me and my colleagues ran some tests with Flink on YARN
and detected a possible inconsistent behavior in the way the contents of
the flink/lib directory is shipped to the cluster when run on YARN,
depending on the fact that the jobs are deployed individually or onto a
long-running session.

After some discussion on the user mailing list
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html>
we were under the impression that the contents of that folder are always
supposed to be copied so that all the nodes have access to them.
Furthermore, we've found a comment in the code
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/FlinkYarnClientBase.java#L254-L263>
that states:

// remove uberjar from ship list (by default everything in the lib/ folder
is added to
// the list of files to ship, but we handle the uberjar separately.

However, after having a look at some portions of the code, I'm not really
sure if this is actually the case or not. The Flink long-running YARN
session actually ships the contents because it's specified in the
yarn-session.sh script
<https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/yarn-bin/yarn-session.sh#L55>,
however running a single job on YARN does not automatically ship the
contents of the lib folder.

The behavior is not documented an I'd like to write some lines in the docs
to make clear of what is shipped in which case. Also, if there is an
agreement on the behavior that the single jobs on YARN should have, I can
also provide a fix for it. My feeling is that running a job on YARN should
end up in having more or less the same effect, regardless of the way the
job is run.

Let me know what you think, thank you for your attention.

-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to