Hi Robert, I tried adding Daniel's changes to the 0.9 version of flink. So far I haven't been able to get it working. Still getting the same errors. Best,Ankit
On Tuesday, January 27, 2015 2:57 AM, Robert Metzger <rmetz...@apache.org> wrote: The code from Daniel has been written for the old YARN client.I think the most important change is this one: https://github.com/warneke/flink/commit/9843a14637594fb7ee265f5326af9007f2a3191c and it can be backported easily to the new YARN client. On Tue, Jan 27, 2015 at 7:00 AM, Stephan Ewen <se...@apache.org> wrote: Hi Ankit! Kerberos support is not yet in the system, but one of the Flink committers (Daniel Warneke) has made a prototype here: https://github.com/warneke/ flink/tree/security @Daniel Can you give us an update on the status? How do you think is missing before a first version is ready to be merged into the master? Greetings, Stephan On Sun, Jan 18, 2015 at 10:00 AM, Robert Metzger <rmetz...@apache.org> wrote: > Hi Daniel, > > let me answer your questions: > 1. Basically all features you are requesting are implemented in this pull > request: https://github.com/apache/flink/pull/292 (Per Job YARN cluster & > programmatical control of the cluster). Feel free to review the pull > request. It is pending for more than one week now and hasn't gotten much > feedback. Also, I would recommend you to base the work on security support > on that branch. > > 2. I agree that the whole configuration loading process is not nicely > implemented. When I was working on this, I didn't understand all the > features offered by Hadoop's Configuration object. I implemented the stuff > that complicated for making it as easy as possible for users to use Flink > on YARN. As you can see in the code, it is trying different commonly used > environment variables to detect the location of the configuration files. > These config files are then used and respected by the YARN client (for > example the default file system name). > I'll have a look at the "yarn jar" command. One concern I have with this is > that we have an additional requirement through this: We expect the user to > have the "yarn" binary in PATH. I know quite a few environments (for > example some users in the Hortonworks Sandbox) which don't have "hadoop" > and "yarn" in the PATH. The "yarn jar" command as well is accessing the > environment variables required to locate the hadoop configuration. But I > will carefully check if using the "yarn jar" command brings us an > advantage. > > 3. I'm also not completely convinced that this is the right approach. When > I was implementing the first version of Flink on YARN, I though that > deploying many small files to HDFS will cause some load on the NameNode and > need some time. Right now, we have 146 jars in the lib/ directory. I > haven't done a performance comparison but I guess its slower to upload 146 > files to HDFS instead of 1. (it is not only uploading the files to HDFS, > YARN also needs to download and "localize" them prior to allocating new > containers). > Also, when deploying Flink on YARN on Google Compute cloud, the google > compute storage is configured by default ... and its quite slow. So this > would probably lead to a bad user experience. > I completely agree that we need an option for users to use a pre-installed > Flink sitting on HDFS or somewhere else in the cluster. > There is another issue in this area in our project: I don't like that the > "hadoop2" build of flink is producing two binary directories with almost > the same content and layout. We could actually merge the whole YARN stuff > into the regular hadoop2 build. Therefore, I would suggest to put one flink > fat jar into the lib/ directory. This would also make shading of our > dependencies much easier. I will start a separate discussion on that when I > have more time again. Right now, I have more pressing issues to solve. > > Regarding your changes in the "security" branch: I'm super happy that > others are starting to work on the YARN client as well. The whole codebase > has grown over time and its certainly good to have more eyes looking at it. > The security features of YARN and Hadoop in general are something that I've > avoided in the past, because its so difficult to properly test. But its > something we certainly need to address. > > Best, > Robert > > > > > > On Sun, Jan 18, 2015 at 6:28 PM, Daniel Warneke <warn...@apache.org> > wrote: > > > Hi, > > > > I just pushed my first version of Flink supporting YARN environments with > > security/Kerberos enabled [1]. While working with the current Flink > > version, I was really impressed by how easy it is to deploy the software > on > > a YARN cluster. However, there are a few things a stumbled upon and I > would > > be interested in your opinion: > > > > 1. Separation between YARN session and Flink job > > Currently, we separate the Flink YARN session from the Flink jobs, i.e. a > > user first has to bring up the Flink cluster on YARN through a separate > > command and can then submit an arbitrary number of jobs to this cluster. > > Through this separation it is possible to submit individual jobs with a > > really low latency, but it introduces two major problems: First, it is > > currently impossible to programmatically launch a Flink YARN cluster, > > submit a job, wait for its completion and then tear the cluster down > again > > (correct me if I’m wrong here) although this is actually a very important > > use case. Second, with the security enabled, all jobs are executed with > the > > security credentials of the user who launched the Flink cluster. This > > causes massive authorization problems. Therefore, I would propose to move > > to a model where we launch one Flink cluster per job (or at least to make > > this a very prominent option). > > > > 2. Loading Hadoop configuration settings for Flink > > In the current release, we use custom code to identify and load the > > relevant Hadoop XML configuration files (e.g. core-site.xml, > yarn-site.xml) > > for the Flink YARN client. I found this mechanism to be quite fragile as > it > > depends on certain environment variables to be set and assumes certain > > configuration keys to be specified in certain files. For example, with > > Hadoop security enabled, the Flink YARN client needs to know what kind of > > authentication mechanisms HDFS expects for the data transfer. This > setting > > is usually specified in hdfs-site.xml. In the current Flink version, the > > YARN client ignores this file and hence cannot talk to HDFS when security > > is enabled. > > As an alternative, I propose to launch the Flink cluster on YARN through > > the “yarn jar” command. With this command, you get the entire > configuration > > setup for free and no longer have to worry about names of configuration > > files, configuration paths and environment variables. > > > > 3. The uberjar deployment model > > In my opinion, the current Flink deployment model for YARN, with the one > > fat uberjar, is unnecessarily bulky. With the last release the Flink > > uberjar has grown to over 100 MB in size, amounting to almost 400 MB of > > class files when uncompressed. Many of the includes are not even > necessary. > > For example, when using the “yarn jar” hook to deploy Flink, all relevant > > Hadoop libraries are added to the classpath anyway, so there is no need > to > > include them in the uberjar (unless you assume the client does not have a > > Hadoop environment installed). Personally, I would favor a more > > fine-granular deployment model. Especially, when we move to a > > one-job-per-session model, I think we should allow having Flink > > preinstalled on the cluster nodes and not always require to redistribute > > the 100 MB uberjar to each and every node. > > > > Any thoughts on that? > > > > Best regards, > > > > Daniel > > > > [1] https://github.com/warneke/flink/tree/security > > >