Robert Metzger created FLINK-2990: ------------------------------------- Summary: Scala 2.11 build fails to start on YARN Key: FLINK-2990 URL: https://issues.apache.org/jira/browse/FLINK-2990 Project: Flink Issue Type: Bug Components: Build System, YARN Client Affects Versions: 0.10, 1.0 Reporter: Robert Metzger Assignee: Robert Metzger
Deploying the scala 2.11 build of Flink on YARN seems to fail {code} robert@hn0-apache:~/flink010-hd22-scala211/flink-0.10.0$ ./bin/yarn-session.sh -n 2 16:36:32,484 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16:36:32,748 INFO org.apache.flink.yarn.FlinkYarnClient - Using values: 16:36:32,750 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager count = 2 16:36:32,750 INFO org.apache.flink.yarn.FlinkYarnClient - JobManager memory = 1024 16:36:32,750 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager memory = 1024 16:36:32,874 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 16:36:32,930 WARN org.apache.flink.yarn.FlinkYarnClient - The JobManager or TaskManager memory is below the smallest possible YARN Container size. The value of 'yarn.scheduler.minimum-allocation-mb' is '1536'. Please increase the memory size.YARN will allocate the smaller containers but the scheduler will account for the minimum-allocation-mb, maybe not all instances you requested will start. 16:36:33,448 WARN org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 16:36:33,489 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-distabc.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar 16:36:35,367 INFO org.apache.flink.yarn.Utils - Copying from /home/robert/flink010-hd22-scala211/flink-0.10.0/conf/flink-conf.yaml to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-conf.yaml 16:36:35,695 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-python_2.11-0.10.0.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-python_2.11-0.10.0.jar 16:36:35,882 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-distabc.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar 16:36:37,522 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/slf4j-log4j12-1.7.7.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/slf4j-log4j12-1.7.7.jar 16:36:37,740 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/log4j-1.2.17.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/log4j-1.2.17.jar 16:36:37,960 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/conf/logback.xml to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/logback.xml 16:36:38,397 INFO org.apache.flink.yarn.Utils - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/conf/log4j.properties to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/log4j.properties 16:36:38,840 INFO org.apache.flink.yarn.FlinkYarnClient - Submitting application master application_1447063737177_0017 16:36:39,081 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1447063737177_0017 16:36:39,081 INFO org.apache.flink.yarn.FlinkYarnClient - Waiting for the cluster to be allocated 16:36:39,084 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED 16:36:40,086 INFO org.apache.flink.yarn.FlinkYarnClient - Deploying cluster, current state ACCEPTED Error while deploying YARN cluster: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1447063737177_0017 failed 1 times due to AM Container for appattempt_1447063737177_0017_000001 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8088/proxy/application_1447063737177_0017/Then, click on links to logs of each attempt. Diagnostics: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508 java.io.IOException: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1447063737177_0017 org.apache.flink.yarn.FlinkYarnClientBase$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1447063737177_0017 failed 1 times due to AM Container for appattempt_1447063737177_0017_000001 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8088/proxy/application_1447063737177_0017/Then, click on links to logs of each attempt. Diagnostics: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508 java.io.IOException: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1447063737177_0017 at org.apache.flink.yarn.FlinkYarnClientBase.deployInternal(FlinkYarnClientBase.java:646) at org.apache.flink.yarn.FlinkYarnClientBase.deploy(FlinkYarnClientBase.java:338) at org.apache.flink.client.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:409) at org.apache.flink.client.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:351) {code} The problem is that the flink-dist.jar is uploaded twice to HDFS, overwriting the timestamp. When the YARN container gets allocated, the timestamps mismatch and YARN rejects the JAR file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)