Re: Future directions for Flink’s YARN support?

Ankit Jhalaria Wed, 28 Jan 2015 15:46:33 -0800

Hi Robert,
I tried adding Daniel's changes to the 0.9 version of flink. So far I haven't 
been able to get it working. Still getting the same errors.
Best,Ankit


     On Tuesday, January 27, 2015 2:57 AM, Robert Metzger <rmetz...@apache.org> 
wrote:
   

 The code from Daniel has been written for the old YARN client.I think the most 
important change is this one: 
https://github.com/warneke/flink/commit/9843a14637594fb7ee265f5326af9007f2a3191c
 and it can be backported easily to the new YARN client.


On Tue, Jan 27, 2015 at 7:00 AM, Stephan Ewen <se...@apache.org> wrote:

Hi Ankit!

Kerberos support is not yet in the system, but one of the Flink committers
(Daniel Warneke) has made a prototype here: https://github.com/warneke/
flink/tree/security

@Daniel Can you give us an update on the status?  How do you think is
missing before a first version is ready to be merged into the master?

Greetings,
Stephan


On Sun, Jan 18, 2015 at 10:00 AM, Robert Metzger <rmetz...@apache.org>
wrote:

> Hi Daniel,
>
> let me answer your questions:
> 1. Basically all features you are requesting are implemented in this pull
> request: https://github.com/apache/flink/pull/292 (Per Job YARN cluster &
> programmatical control of the cluster). Feel free to review the pull
> request. It is pending for more than one week now and hasn't gotten much
> feedback. Also, I would recommend you to base the work on security support
> on that branch.
>
> 2. I agree that the whole configuration loading process is not nicely
> implemented. When I was working on this, I didn't understand all the
> features offered by Hadoop's Configuration object. I implemented the stuff
> that complicated for making it as easy as possible for users to use Flink
> on YARN. As you can see in the code, it is trying different commonly used
> environment variables to detect the location of the configuration files.
> These config files are then used and respected by the YARN client (for
> example the default file system name).
> I'll have a look at the "yarn jar" command. One concern I have with this is
> that we have an additional requirement through this: We expect the user to
> have the "yarn" binary in PATH. I know quite a few environments (for
> example some users in the Hortonworks Sandbox) which don't have "hadoop"
> and "yarn" in the PATH. The "yarn jar" command as well is accessing the
> environment variables required to locate the hadoop configuration. But I
> will carefully check if using the "yarn jar" command brings us an
> advantage.
>
> 3. I'm also not completely convinced that this is the right approach. When
> I was implementing the first version of Flink on YARN, I though that
> deploying many small files to HDFS will cause some load on the NameNode and
> need some time. Right now, we have 146 jars in the lib/ directory. I
> haven't done a performance comparison but I guess its slower to upload 146
> files to HDFS instead of 1. (it is not only uploading the files to HDFS,
> YARN also needs to download and "localize" them prior to allocating new
> containers).
> Also, when deploying Flink on YARN on Google Compute cloud, the google
> compute storage is configured by default ... and its quite slow. So this
> would probably lead to a bad user experience.
> I completely agree that we need an option for users to use a pre-installed
> Flink sitting on HDFS or somewhere else in the cluster.
> There is another issue in this area in our project: I don't like that the
> "hadoop2" build of flink is producing two binary directories with almost
> the same content and layout. We could actually merge the whole YARN stuff
> into the regular hadoop2 build. Therefore, I would suggest to put one flink
> fat jar into the lib/ directory. This would also make shading of our
> dependencies much easier. I will start a separate discussion on that when I
> have more time again. Right now, I have more pressing issues to solve.
>
> Regarding your changes in the "security" branch: I'm super happy that
> others are starting to work on the YARN client as well. The whole codebase
> has grown over time and its certainly good to have more eyes looking at it.
> The security features of YARN and Hadoop in general are something that I've
> avoided in the past, because its so difficult to properly test. But its
> something we certainly need to address.
>
> Best,
> Robert
>
>
>
>
>
> On Sun, Jan 18, 2015 at 6:28 PM, Daniel Warneke <warn...@apache.org>
> wrote:
>
> > Hi,
> >
> > I just pushed my first version of Flink supporting YARN environments with
> > security/Kerberos enabled [1]. While working with the current Flink
> > version, I was really impressed by how easy it is to deploy the software
> on
> > a YARN cluster. However, there are a few things a stumbled upon and I
> would
> > be interested in your opinion:
> >
> > 1. Separation between YARN session and Flink job
> > Currently, we separate the Flink YARN session from the Flink jobs, i.e. a
> > user first has to bring up the Flink cluster on YARN through a separate
> > command and can then submit an arbitrary number of jobs to this cluster.
> > Through this separation it is possible to submit individual jobs with a
> > really low latency, but it introduces two major problems: First, it is
> > currently impossible to programmatically launch a Flink YARN cluster,
> > submit a job, wait for its completion and then tear the cluster down
> again
> > (correct me if I’m wrong here) although this is actually a very important
> > use case. Second, with the security enabled, all jobs are executed with
> the
> > security credentials of the user who launched the Flink cluster. This
> > causes massive authorization problems. Therefore, I would propose to move
> > to a model where we launch one Flink cluster per job (or at least to make
> > this a very prominent option).
> >
> > 2. Loading Hadoop configuration settings for Flink
> > In the current release, we use custom code to identify and load the
> > relevant Hadoop XML configuration files (e.g. core-site.xml,
> yarn-site.xml)
> > for the Flink YARN client. I found this mechanism to be quite fragile as
> it
> > depends on certain environment variables to be set and assumes certain
> > configuration keys to be specified in certain files. For example, with
> > Hadoop security enabled, the Flink YARN client needs to know what kind of
> > authentication mechanisms HDFS expects for the data transfer. This
> setting
> > is usually specified in hdfs-site.xml. In the current Flink version, the
> > YARN client ignores this file and hence cannot talk to HDFS when security
> > is enabled.
> > As an alternative, I propose to launch the Flink cluster on YARN through
> > the “yarn jar” command. With this command, you get the entire
> configuration
> > setup for free and no longer have to worry about names of configuration
> > files, configuration paths and environment variables.
> >
> > 3. The uberjar deployment model
> > In my opinion, the current Flink deployment model for YARN, with the one
> > fat uberjar, is unnecessarily bulky. With the last release the Flink
> > uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
> > class files when uncompressed. Many of the includes are not even
> necessary.
> > For example, when using the “yarn jar” hook to deploy Flink, all relevant
> > Hadoop libraries are added to the classpath anyway, so there is no need
> to
> > include them in the uberjar (unless you assume the client does not have a
> > Hadoop environment installed). Personally, I would favor a more
> > fine-granular deployment model. Especially, when we move to a
> > one-job-per-session model, I think we should allow having Flink
> > preinstalled on the cluster nodes and not always require to redistribute
> > the 100 MB uberjar to each and every node.
> >
> > Any thoughts on that?
> >
> > Best regards,
> >
> >     Daniel
> >
> > [1] https://github.com/warneke/flink/tree/security
> >
>

Re: Future directions for Flink’s YARN support?

Reply via email to