Github user warneke commented on the pull request:

    https://github.com/apache/flink/pull/358#issuecomment-72739181
  
    Hi,
    
    I tried the code and found the following three problems:
    
    __Flink launch script (bin/flink) points to the wrong log4j configuration 
file__
    
    log4j:ERROR Could not read configuration file from URL 
[file:/home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties].
    java.io.FileNotFoundException: 
/home/warneke/workspace/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-yarn-0.9-SNAPSHOT/bin/../conf/log4j-cli.properties
 (No such file or directory)
            at java.io.FileInputStream.open(Native Method)
            at java.io.FileInputStream.<init>(FileInputStream.java:146)
            at java.io.FileInputStream.<init>(FileInputStream.java:101)
            at 
sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
            at 
sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
            at 
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
            at 
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
            at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
            at 
org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277)
            at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288)
            at 
org.apache.flink.client.FlinkYarnSessionCli.<clinit>(FlinkYarnSessionCli.java:53)
            at org.apache.flink.client.CliFrontend.<clinit>(CliFrontend.java:81)
    
    __Flink YARN client hangs indefinitely when user has no Kerberos ticket__
    
    When the user launches Flink without a Kerberos ticket, the client loops 
indefinitely in the following function call instead of throwing an exception:
    
    "main" prio=10 tid=0x00007febe800a000 nid=0x1770 waiting on condition 
[0x00007febedf82000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
            at java.lang.Thread.sleep(Native Method)
            at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:151)
            at com.sun.proxy.$Proxy12.getNewApplication(Unknown Source)
            at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:191)
            at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:199)
            at 
org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:303)
            at 
org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:283)
            at 
org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
            at 
org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280)
            at 
org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921)
            at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
            at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1067)
            at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1091)
    
    Interestingly, the code passes the ugi.doAs call even without a valid 
ticket. In my environment (CDH5.2.0), UserGroupInformation.getCurrentUser() 
produces the following output inside the doAs run function:
    
    With valid ticket: warneke@WARNEKE.LOCAL (auth:KERBEROS)
    Without valid ticket: warneke (auth:KERBEROS)
    
    __Problem with hard-coded default queue name__
    
    Even with a valid Kerberos ticket, the YARN deployment fails with the 
following error message on CDH5.2.0
    
    java.lang.RuntimeException: Error deploying the YARN cluster
            at 
org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:923)
            at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
            at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1066)
            at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1090)
    Caused by: org.apache.flink.yarn.FlinkYarnClient$YarnDeploymentException: 
The specified queue 'default' does not exist. Available queues: root.default, 
            at 
org.apache.flink.yarn.FlinkYarnClient.deployInternal(FlinkYarnClient.java:325)
            at 
org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:286)
            at 
org.apache.flink.yarn.FlinkYarnClient$1.run(FlinkYarnClient.java:280)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:415)
            at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
            at 
org.apache.flink.yarn.FlinkYarnClient.deploy(FlinkYarnClient.java:280)
            at 
org.apache.flink.client.CliFrontend.getClient(CliFrontend.java:921)
            ... 3 more
    
    This was never an issue with the previous YARN deployment mechanism. Can't 
we simply leave the YARN queue unspecified unless the user explicitly specifies 
it?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to