Future directions for Flink’s YARN support?

Daniel Warneke Sun, 18 Jan 2015 09:30:01 -0800

Hi,

I just pushed my first version of Flink supporting YARN environmentswith security/Kerberos enabled [1]. While working with the current Flinkversion, I was really impressed by how easy it is to deploy the softwareon a YARN cluster. However, there are a few things a stumbled upon and Iwould be interested in your opinion:


1. Separation between YARN session and Flink job

Currently, we separate the Flink YARN session from the Flink jobs, i.e.a user first has to bring up the Flink cluster on YARN through aseparate command and can then submit an arbitrary number of jobs to thiscluster. Through this separation it is possible to submit individualjobs with a really low latency, but it introduces two major problems:First, it is currently impossible to programmatically launch a FlinkYARN cluster, submit a job, wait for its completion and then tear thecluster down again (correct me if I’m wrong here) although this isactually a very important use case. Second, with the security enabled,all jobs are executed with the security credentials of the user wholaunched the Flink cluster. This causes massive authorization problems.Therefore, I would propose to move to a model where we launch one Flinkcluster per job (or at least to make this a very prominent option).


2. Loading Hadoop configuration settings for Flink

In the current release, we use custom code to identify and load therelevant Hadoop XML configuration files (e.g. core-site.xml,yarn-site.xml) for the Flink YARN client. I found this mechanism to bequite fragile as it depends on certain environment variables to be setand assumes certain configuration keys to be specified in certain files.For example, with Hadoop security enabled, the Flink YARN client needsto know what kind of authentication mechanisms HDFS expects for the datatransfer. This setting is usually specified in hdfs-site.xml. In thecurrent Flink version, the YARN client ignores this file and hencecannot talk to HDFS when security is enabled.As an alternative, I propose to launch the Flink cluster on YARN throughthe “yarn jar” command. With this command, you get the entireconfiguration setup for free and no longer have to worry about names ofconfiguration files, configuration paths and environment variables.


3. The uberjar deployment model

In my opinion, the current Flink deployment model for YARN, with the onefat uberjar, is unnecessarily bulky. With the last release the Flinkuberjar has grown to over 100 MB in size, amounting to almost 400 MB ofclass files when uncompressed. Many of the includes are not evennecessary. For example, when using the “yarn jar” hook to deploy Flink,all relevant Hadoop libraries are added to the classpath anyway, sothere is no need to include them in the uberjar (unless you assume theclient does not have a Hadoop environment installed). Personally, Iwould favor a more fine-granular deployment model. Especially, when wemove to a one-job-per-session model, I think we should allow havingFlink preinstalled on the cluster nodes and not always require toredistribute the 100 MB uberjar to each and every node.


Any thoughts on that?

Best regards,

    Daniel

[1] https://github.com/warneke/flink/tree/security

Future directions for Flink’s YARN support?

Reply via email to