Github user warneke commented on the pull request: https://github.com/apache/flink/pull/358#issuecomment-72739181 Hi, I tried the code and found the following three problems: __Flink launch script (bin/flink) points to the wrong log4j configuration file__ log4j:ERROR Could not read configuration file from URL [file:/home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties]. java.io.FileNotFoundException: /home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:146) at java.io.FileInputStream.<init>(FileInputStream.java:101) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288) at org.apache.flink.client.FlinkYarnSessionCli.<clinit>(FlinkYarnSessionCli.java:53) at org.apache.flink.client.CliFrontend.<clinit>(CliFrontend.java:81) __Flink YARN client hangs indefinitely when user has no Kerberos ticket__ When the user launches Flink without a Kerberos ticket, the client loops indefinitely in the following function call instead of throwing an exception: "main" prio=10 tid=0x00007febe800a000 nid=0x1770 waiting on condition [0x00007febedf82000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:151) at com.sun.proxy.$Proxy12.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:191) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:199) at org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:303) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:283) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280) at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1067) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1091) Interestingly, the code passes the ugi.doAs call even without a valid ticket. In my environment (CDH5.2.0), UserGroupInformation.getCurrentUser() produces the following output inside the doAs run function: With valid ticket: warneke@WARNEKE.LOCAL (auth:KERBEROS) Without valid ticket: warneke (auth:KERBEROS) __Problem with hard-coded default queue name__ Even with a valid Kerberos ticket, the YARN deployment fails with the following error message on CDH5.2.0 java.lang.RuntimeException: Error deploying the YARN cluster at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:923) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1066) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1090) Caused by: org.apache.flink.yarn.FlinkYarnClient$YarnDeploymentException: The specified queue 'default' does not exist. Available queues: root.default, at org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:325) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:286) at org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280) at org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921) ... 3 more This was never an issue with the previous YARN deployment mechanism. Can't we simply leave the YARN queue unspecified unless the user explicitly specifies it?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---