Hi, The main problem was that whatever was going wrong was not apparent in the Flink Application master runner but it was only shown in the YarnClient debug log.
If you run with the default INFO log level all you see that the Yarn client is trying to fail over again and again as if something was wrong with the resource manager. Setting it to debug actually shows the error. Also it would be great if there was a way to verify YARN versions and incompatibility, not sure if this is possible easily. Gyula Ufuk Celebi <u...@apache.org> ezt írta (időpont: 2016. nov. 14., H, 9:42): > Good to know that you solved this. :) Do you think there is something we > can do to help users noticing this situation faster? > > – Ufuk > > On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.f...@gmail.com) wrote: > > Hi, > > > > What happened is that I compiled Flink with the wrong hadoop version... > > > > Sorry :) > > Gyula > > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo, > > 13:11): > > > > > Hi, > > > > > > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For > > > some reason I started getting this error (see under text.) > > > The job manager starts and the application is in Accepted state but > cannot > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems > > > strange) > > > > > > I didn't change anything on the yarn cluster and this seemed to work > > > previously (but I just cant get it to work now). The yarn-site.xml > contains > > > the proper rm addresses. > > > > > > Anybody has any ideas where to go from here? > > > > > > Cheers, > > > Gyula > > > > > > JM log: > > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping > interval > > is 60000 ms. > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - > Connecting to /0.0.0.0:8030 > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing > ipc connection > > to 0.0.0.0/0.0.0.0:8030: Connection refused > > > > > > java.net.ConnectException: Call From > splat24.sto.midasplayer.com/172.25.86.166 > > to 0.0.0.0:8030 failed on connection exception: > java.net.ConnectException: Connection > > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > > > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359) > > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source) > > > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > > at java.lang.reflect.Method.invoke(Method.java:497) > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source) > > > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196) > > > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > > > at > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259) > > > at > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185) > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470) > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97) > > > at akka.actor.ActorCell.create(ActorCell.scala:580) > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) > > > > > > > > > Client: > > > > > > 2016-11-12 12:31:31,080 INFO > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > - No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor > > to locate the jar > > > 2016-11-12 12:31:31,080 INFO > org.apache.flink.yarn.cli.FlinkYarnSessionCli > > - No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor > > to locate the jar > > > 2016-11-12 12:31:31,101 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > Using values: > > > 2016-11-12 12:31:31,101 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > TaskManager count = 1 > > > 2016-11-12 12:31:31,101 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > JobManager memory = 1024 > > > 2016-11-12 12:31:31,102 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > TaskManager memory = 11000 > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy - > Connecting > > to ResourceManager at /0.0.0.0:8032 > > > 2016-11-12 12:31:31,394 WARN > org.apache.flink.yarn.YarnClusterDescriptor - > > The file system scheme is 'file'. This indicates that the specified > Hadoop configuration > > path is wrong and the system is using the default Hadoop configuration > values.The Flink > > YARN client needs to store its files in a distributed file system > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties > > to > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying > from file:/fjord/sites/flink-1.1.3/lib > > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar > > to > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar > > to > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml > > to > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml > > > 2016-11-12 12:33:02,218 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > Submitting application master application_1478896050022_0013 > > > 2016-11-12 12:33:02,256 INFO > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl > > - Submitted application application_1478896050022_0013 > > > 2016-11-12 12:33:02,257 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > Waiting for the cluster to be allocated > > > 2016-11-12 12:33:02,259 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > Deploying cluster, current state ACCEPTED > > > 2016-11-12 12:34:02,485 INFO > org.apache.flink.yarn.YarnClusterDescriptor - > > Deployment took more than 60 seconds. Please check if the requested > resources are available > > in the YARN cluster > > > > > > > > > >