Hi,
I just pushed my first version of Flink supporting YARN environments
with security/Kerberos enabled [1]. While working with the current Flink
version, I was really impressed by how easy it is to deploy the software
on a YARN cluster. However, there are a few things a stumbled upon and I
would be interested in your opinion:
1. Separation between YARN session and Flink job
Currently, we separate the Flink YARN session from the Flink jobs, i.e.
a user first has to bring up the Flink cluster on YARN through a
separate command and can then submit an arbitrary number of jobs to this
cluster. Through this separation it is possible to submit individual
jobs with a really low latency, but it introduces two major problems:
First, it is currently impossible to programmatically launch a Flink
YARN cluster, submit a job, wait for its completion and then tear the
cluster down again (correct me if I’m wrong here) although this is
actually a very important use case. Second, with the security enabled,
all jobs are executed with the security credentials of the user who
launched the Flink cluster. This causes massive authorization problems.
Therefore, I would propose to move to a model where we launch one Flink
cluster per job (or at least to make this a very prominent option).
2. Loading Hadoop configuration settings for Flink
In the current release, we use custom code to identify and load the
relevant Hadoop XML configuration files (e.g. core-site.xml,
yarn-site.xml) for the Flink YARN client. I found this mechanism to be
quite fragile as it depends on certain environment variables to be set
and assumes certain configuration keys to be specified in certain files.
For example, with Hadoop security enabled, the Flink YARN client needs
to know what kind of authentication mechanisms HDFS expects for the data
transfer. This setting is usually specified in hdfs-site.xml. In the
current Flink version, the YARN client ignores this file and hence
cannot talk to HDFS when security is enabled.
As an alternative, I propose to launch the Flink cluster on YARN through
the “yarn jar” command. With this command, you get the entire
configuration setup for free and no longer have to worry about names of
configuration files, configuration paths and environment variables.
3. The uberjar deployment model
In my opinion, the current Flink deployment model for YARN, with the one
fat uberjar, is unnecessarily bulky. With the last release the Flink
uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
class files when uncompressed. Many of the includes are not even
necessary. For example, when using the “yarn jar” hook to deploy Flink,
all relevant Hadoop libraries are added to the classpath anyway, so
there is no need to include them in the uberjar (unless you assume the
client does not have a Hadoop environment installed). Personally, I
would favor a more fine-granular deployment model. Especially, when we
move to a one-job-per-session model, I think we should allow having
Flink preinstalled on the cluster nodes and not always require to
redistribute the 100 MB uberjar to each and every node.
Any thoughts on that?
Best regards,
Daniel
[1] https://github.com/warneke/flink/tree/security