[ https://issues.apache.org/jira/browse/FLINK-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304042#comment-14304042 ]
ASF GitHub Bot commented on FLINK-592: -------------------------------------- Github user warneke commented on the pull request: https://github.com/apache/flink/pull/358#issuecomment-72739181 Hi, I tried the code and found the following three problems: __Flink launch script (bin/flink) points to the wrong log4j configuration file__ log4j:ERROR Could not read configuration file from URL [file:/home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties]. java.io.FileNotFoundException: /home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:146) at java.io.FileInputStream.<init>(FileInputStream.java:101) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288) at org.apache.flink.client.FlinkYarnSessionCli.<clinit>(FlinkYarnSessionCli.java:53) at org.apache.flink.client.CliFrontend.<clinit>(CliFrontend.java:81) __Flink YARN client hangs indefinitely when user has no Kerberos ticket__ When the user launches Flink without a Kerberos ticket, the client loops indefinitely in the following function call instead of throwing an exception: "main" prio=10 tid=0x00007febe800a000 nid=0x1770 waiting on condition [0x00007febedf82000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:151) at com.sun.proxy.$Proxy12.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:191) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:199) at org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:303) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:283) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280) at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1067) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1091) Interestingly, the code passes the ugi.doAs call even without a valid ticket. In my environment (CDH5.2.0), UserGroupInformation.getCurrentUser() produces the following output inside the doAs run function: With valid ticket: warneke@WARNEKE.LOCAL (auth:KERBEROS) Without valid ticket: warneke (auth:KERBEROS) __Problem with hard-coded default queue name__ Even with a valid Kerberos ticket, the YARN deployment fails with the following error message on CDH5.2.0 java.lang.RuntimeException: Error deploying the YARN cluster at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:923) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1066) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1090) Caused by: org.apache.flink.yarn.FlinkYarnClient$YarnDeploymentException: The specified queue 'default' does not exist. Available queues: root.default, at org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:325) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:286) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280) at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921) ... 3 more This was never an issue with the previous YARN deployment mechanism. Can't we simply leave the YARN queue unspecified unless the user explicitly specifies it? > Add support for secure YARN clusters with Kerberos Auth > ------------------------------------------------------- > > Key: FLINK-592 > URL: https://issues.apache.org/jira/browse/FLINK-592 > Project: Flink > Issue Type: Improvement > Components: YARN Client > Reporter: GitHub Import > Assignee: Daniel Warneke > Priority: Minor > Labels: github-import > Fix For: pre-apache > > > The current YARN client will throw an exception (as of > https://github.com/stratosphere/stratosphere/pull/591) if it detects a secure > environment. > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/592 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, YARN, > Created at: Sun Mar 16 11:05:07 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)