Re: Future directions for Flink’s YARN support?

Robert Metzger Tue, 27 Jan 2015 02:58:28 -0800

The code from Daniel has been written for the old YARN client.
I think the most important change is this one:
https://github.com/warneke/flink/commit/9843a14637594fb7ee265f5326af9007f2a3191c
and it can be backported easily to the new YARN client.




On Tue, Jan 27, 2015 at 7:00 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi Ankit!
>
> Kerberos support is not yet in the system, but one of the Flink committers
> (Daniel Warneke) has made a prototype here: https://github.com/warneke/
> flink/tree/security
>
> @Daniel Can you give us an update on the status?  How do you think is
> missing before a first version is ready to be merged into the master?
>
> Greetings,
> Stephan
>
>
> On Sun, Jan 18, 2015 at 10:00 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > Hi Daniel,
> >
> > let me answer your questions:
> > 1. Basically all features you are requesting are implemented in this pull
> > request: https://github.com/apache/flink/pull/292 (Per Job YARN cluster
> &
> > programmatical control of the cluster). Feel free to review the pull
> > request. It is pending for more than one week now and hasn't gotten much
> > feedback. Also, I would recommend you to base the work on security
> support
> > on that branch.
> >
> > 2. I agree that the whole configuration loading process is not nicely
> > implemented. When I was working on this, I didn't understand all the
> > features offered by Hadoop's Configuration object. I implemented the
> stuff
> > that complicated for making it as easy as possible for users to use Flink
> > on YARN. As you can see in the code, it is trying different commonly used
> > environment variables to detect the location of the configuration files.
> > These config files are then used and respected by the YARN client (for
> > example the default file system name).
> > I'll have a look at the "yarn jar" command. One concern I have with this
> is
> > that we have an additional requirement through this: We expect the user
> to
> > have the "yarn" binary in PATH. I know quite a few environments (for
> > example some users in the Hortonworks Sandbox) which don't have "hadoop"
> > and "yarn" in the PATH. The "yarn jar" command as well is accessing the
> > environment variables required to locate the hadoop configuration. But I
> > will carefully check if using the "yarn jar" command brings us an
> > advantage.
> >
> > 3. I'm also not completely convinced that this is the right approach.
> When
> > I was implementing the first version of Flink on YARN, I though that
> > deploying many small files to HDFS will cause some load on the NameNode
> and
> > need some time. Right now, we have 146 jars in the lib/ directory. I
> > haven't done a performance comparison but I guess its slower to upload
> 146
> > files to HDFS instead of 1. (it is not only uploading the files to HDFS,
> > YARN also needs to download and "localize" them prior to allocating new
> > containers).
> > Also, when deploying Flink on YARN on Google Compute cloud, the google
> > compute storage is configured by default ... and its quite slow. So this
> > would probably lead to a bad user experience.
> > I completely agree that we need an option for users to use a
> pre-installed
> > Flink sitting on HDFS or somewhere else in the cluster.
> > There is another issue in this area in our project: I don't like that the
> > "hadoop2" build of flink is producing two binary directories with almost
> > the same content and layout. We could actually merge the whole YARN stuff
> > into the regular hadoop2 build. Therefore, I would suggest to put one
> flink
> > fat jar into the lib/ directory. This would also make shading of our
> > dependencies much easier. I will start a separate discussion on that
> when I
> > have more time again. Right now, I have more pressing issues to solve.
> >
> > Regarding your changes in the "security" branch: I'm super happy that
> > others are starting to work on the YARN client as well. The whole
> codebase
> > has grown over time and its certainly good to have more eyes looking at
> it.
> > The security features of YARN and Hadoop in general are something that
> I've
> > avoided in the past, because its so difficult to properly test. But its
> > something we certainly need to address.
> >
> > Best,
> > Robert
> >
> >
> >
> >
> >
> > On Sun, Jan 18, 2015 at 6:28 PM, Daniel Warneke <warn...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > I just pushed my first version of Flink supporting YARN environments
> with
> > > security/Kerberos enabled [1]. While working with the current Flink
> > > version, I was really impressed by how easy it is to deploy the
> software
> > on
> > > a YARN cluster. However, there are a few things a stumbled upon and I
> > would
> > > be interested in your opinion:
> > >
> > > 1. Separation between YARN session and Flink job
> > > Currently, we separate the Flink YARN session from the Flink jobs,
> i.e. a
> > > user first has to bring up the Flink cluster on YARN through a separate
> > > command and can then submit an arbitrary number of jobs to this
> cluster.
> > > Through this separation it is possible to submit individual jobs with a
> > > really low latency, but it introduces two major problems: First, it is
> > > currently impossible to programmatically launch a Flink YARN cluster,
> > > submit a job, wait for its completion and then tear the cluster down
> > again
> > > (correct me if I’m wrong here) although this is actually a very
> important
> > > use case. Second, with the security enabled, all jobs are executed with
> > the
> > > security credentials of the user who launched the Flink cluster. This
> > > causes massive authorization problems. Therefore, I would propose to
> move
> > > to a model where we launch one Flink cluster per job (or at least to
> make
> > > this a very prominent option).
> > >
> > > 2. Loading Hadoop configuration settings for Flink
> > > In the current release, we use custom code to identify and load the
> > > relevant Hadoop XML configuration files (e.g. core-site.xml,
> > yarn-site.xml)
> > > for the Flink YARN client. I found this mechanism to be quite fragile
> as
> > it
> > > depends on certain environment variables to be set and assumes certain
> > > configuration keys to be specified in certain files. For example, with
> > > Hadoop security enabled, the Flink YARN client needs to know what kind
> of
> > > authentication mechanisms HDFS expects for the data transfer. This
> > setting
> > > is usually specified in hdfs-site.xml. In the current Flink version,
> the
> > > YARN client ignores this file and hence cannot talk to HDFS when
> security
> > > is enabled.
> > > As an alternative, I propose to launch the Flink cluster on YARN
> through
> > > the “yarn jar” command. With this command, you get the entire
> > configuration
> > > setup for free and no longer have to worry about names of configuration
> > > files, configuration paths and environment variables.
> > >
> > > 3. The uberjar deployment model
> > > In my opinion, the current Flink deployment model for YARN, with the
> one
> > > fat uberjar, is unnecessarily bulky. With the last release the Flink
> > > uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
> > > class files when uncompressed. Many of the includes are not even
> > necessary.
> > > For example, when using the “yarn jar” hook to deploy Flink, all
> relevant
> > > Hadoop libraries are added to the classpath anyway, so there is no need
> > to
> > > include them in the uberjar (unless you assume the client does not
> have a
> > > Hadoop environment installed). Personally, I would favor a more
> > > fine-granular deployment model. Especially, when we move to a
> > > one-job-per-session model, I think we should allow having Flink
> > > preinstalled on the cluster nodes and not always require to
> redistribute
> > > the 100 MB uberjar to each and every node.
> > >
> > > Any thoughts on that?
> > >
> > > Best regards,
> > >
> > >     Daniel
> > >
> > > [1] https://github.com/warneke/flink/tree/security
> > >
> >
>

Re: Future directions for Flink’s YARN support?

Reply via email to