Re: Future directions for Flink’s YARN support?

Alexander Alexandrov Sun, 18 Jan 2015 09:42:57 -0800

Hi Daniel,

I think at least regarding 3 threre is a quick fix in the pom.xml - we need
to exclude the hadoop-* artifacts from the shader plugin. I think Robert
can confirm whether this is the case.


Regards,

Alexander

2015-01-18 18:28 GMT+01:00 Daniel Warneke <warn...@apache.org>:

> Hi,
>
> I just pushed my first version of Flink supporting YARN environments with
> security/Kerberos enabled [1]. While working with the current Flink
> version, I was really impressed by how easy it is to deploy the software on
> a YARN cluster. However, there are a few things a stumbled upon and I would
> be interested in your opinion:
>
> 1. Separation between YARN session and Flink job
> Currently, we separate the Flink YARN session from the Flink jobs, i.e. a
> user first has to bring up the Flink cluster on YARN through a separate
> command and can then submit an arbitrary number of jobs to this cluster.
> Through this separation it is possible to submit individual jobs with a
> really low latency, but it introduces two major problems: First, it is
> currently impossible to programmatically launch a Flink YARN cluster,
> submit a job, wait for its completion and then tear the cluster down again
> (correct me if I’m wrong here) although this is actually a very important
> use case. Second, with the security enabled, all jobs are executed with the
> security credentials of the user who launched the Flink cluster. This
> causes massive authorization problems. Therefore, I would propose to move
> to a model where we launch one Flink cluster per job (or at least to make
> this a very prominent option).
>
> 2. Loading Hadoop configuration settings for Flink
> In the current release, we use custom code to identify and load the
> relevant Hadoop XML configuration files (e.g. core-site.xml, yarn-site.xml)
> for the Flink YARN client. I found this mechanism to be quite fragile as it
> depends on certain environment variables to be set and assumes certain
> configuration keys to be specified in certain files. For example, with
> Hadoop security enabled, the Flink YARN client needs to know what kind of
> authentication mechanisms HDFS expects for the data transfer. This setting
> is usually specified in hdfs-site.xml. In the current Flink version, the
> YARN client ignores this file and hence cannot talk to HDFS when security
> is enabled.
> As an alternative, I propose to launch the Flink cluster on YARN through
> the “yarn jar” command. With this command, you get the entire configuration
> setup for free and no longer have to worry about names of configuration
> files, configuration paths and environment variables.
>
> 3. The uberjar deployment model
> In my opinion, the current Flink deployment model for YARN, with the one
> fat uberjar, is unnecessarily bulky. With the last release the Flink
> uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
> class files when uncompressed. Many of the includes are not even necessary.
> For example, when using the “yarn jar” hook to deploy Flink, all relevant
> Hadoop libraries are added to the classpath anyway, so there is no need to
> include them in the uberjar (unless you assume the client does not have a
> Hadoop environment installed). Personally, I would favor a more
> fine-granular deployment model. Especially, when we move to a
> one-job-per-session model, I think we should allow having Flink
> preinstalled on the cluster nodes and not always require to redistribute
> the 100 MB uberjar to each and every node.
>
> Any thoughts on that?
>
> Best regards,
>
>     Daniel
>
> [1] https://github.com/warneke/flink/tree/security
>

Re: Future directions for Flink’s YARN support?

Reply via email to