Re: Future directions for Flink’s YARN support?

Stephan Ewen Sun, 18 Jan 2015 09:50:11 -0800

Hi Daniel!

Thank you for your thoughts, those are good comments! I am sure Robert can
elaborate more, but here are some answers on how I understand things:


Concerning (1) Support for that (per-job yarn sessions and programmatic
setup/teardown) is in the making and a first version is on a pull request (
https://github.com/apache/flink/pull/292)

Concerning (3) Is would make sense to define the hadoop dependencies as
"provided", which means the compiler pulls them for compilation, but they
are not packaged as they are assumed to be present at the target site.

Greetings,
Stephan


On Sun, Jan 18, 2015 at 6:41 PM, Alexander Alexandrov <
[email protected]> wrote:

> Hi Daniel,
>
> I think at least regarding 3 threre is a quick fix in the pom.xml - we need
> to exclude the hadoop-* artifacts from the shader plugin. I think Robert
> can confirm whether this is the case.
>
> Regards,
>
> Alexander
>
> 2015-01-18 18:28 GMT+01:00 Daniel Warneke <[email protected]>:
>
> > Hi,
> >
> > I just pushed my first version of Flink supporting YARN environments with
> > security/Kerberos enabled [1]. While working with the current Flink
> > version, I was really impressed by how easy it is to deploy the software
> on
> > a YARN cluster. However, there are a few things a stumbled upon and I
> would
> > be interested in your opinion:
> >
> > 1. Separation between YARN session and Flink job
> > Currently, we separate the Flink YARN session from the Flink jobs, i.e. a
> > user first has to bring up the Flink cluster on YARN through a separate
> > command and can then submit an arbitrary number of jobs to this cluster.
> > Through this separation it is possible to submit individual jobs with a
> > really low latency, but it introduces two major problems: First, it is
> > currently impossible to programmatically launch a Flink YARN cluster,
> > submit a job, wait for its completion and then tear the cluster down
> again
> > (correct me if I’m wrong here) although this is actually a very important
> > use case. Second, with the security enabled, all jobs are executed with
> the
> > security credentials of the user who launched the Flink cluster. This
> > causes massive authorization problems. Therefore, I would propose to move
> > to a model where we launch one Flink cluster per job (or at least to make
> > this a very prominent option).
> >
> > 2. Loading Hadoop configuration settings for Flink
> > In the current release, we use custom code to identify and load the
> > relevant Hadoop XML configuration files (e.g. core-site.xml,
> yarn-site.xml)
> > for the Flink YARN client. I found this mechanism to be quite fragile as
> it
> > depends on certain environment variables to be set and assumes certain
> > configuration keys to be specified in certain files. For example, with
> > Hadoop security enabled, the Flink YARN client needs to know what kind of
> > authentication mechanisms HDFS expects for the data transfer. This
> setting
> > is usually specified in hdfs-site.xml. In the current Flink version, the
> > YARN client ignores this file and hence cannot talk to HDFS when security
> > is enabled.
> > As an alternative, I propose to launch the Flink cluster on YARN through
> > the “yarn jar” command. With this command, you get the entire
> configuration
> > setup for free and no longer have to worry about names of configuration
> > files, configuration paths and environment variables.
> >
> > 3. The uberjar deployment model
> > In my opinion, the current Flink deployment model for YARN, with the one
> > fat uberjar, is unnecessarily bulky. With the last release the Flink
> > uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
> > class files when uncompressed. Many of the includes are not even
> necessary.
> > For example, when using the “yarn jar” hook to deploy Flink, all relevant
> > Hadoop libraries are added to the classpath anyway, so there is no need
> to
> > include them in the uberjar (unless you assume the client does not have a
> > Hadoop environment installed). Personally, I would favor a more
> > fine-granular deployment model. Especially, when we move to a
> > one-job-per-session model, I think we should allow having Flink
> > preinstalled on the cluster nodes and not always require to redistribute
> > the 100 MB uberjar to each and every node.
> >
> > Any thoughts on that?
> >
> > Best regards,
> >
> >     Daniel
> >
> > [1] https://github.com/warneke/flink/tree/security
> >
>

Re: Future directions for Flink’s YARN support?

Reply via email to