Re: submit job failed on Yarn HA

2019-03-06 Thread 孙森
Hi Gary: Yes, it’s the second case, the client host is different from the session cluster got started. I’ve tried the way by using" flink run -yid “, it really works. Best! Sen > 在 2019年3月6日,下午3:19,Gary Yao 写道: > > Hi Sen, > > I took a look at your CLI logs again, and saw that it

Re: submit job failed on Yarn HA

2019-03-05 Thread Gary Yao
Hi Sen, I took a look at your CLI logs again, and saw that it uses the "default" Flink namespace in ZooKeeper: 2019-02-28 11:18:05,255 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using '/flink/default' as Zookeeper namespace. However, since you are using YARN, the Fl

Re: submit job failed on Yarn HA

2019-03-05 Thread 孙森
Hi Gary: Thanks very much! I have tried it as the way you said. It works. Hopes that the bug can be fixed as soon as possible. Best! Sen > 在 2019年3月5日,下午3:15,Gary Yao 写道: > > Hi Sen, > > In that email I meant that you should disable the ZooKeeper configuration in > the CL

Re: submit job failed on Yarn HA

2019-03-04 Thread Gary Yao
Hi Sen, I don't see high-availability: zookeeper in your Flink configuration. However, this is mandatory for an HA setup. By default "none" is used, and the ZK configuration is ignored. The log also hints that you are using StandaloneLeaderElectionService instead of the ZooKeeper implementat

Re: submit job failed on Yarn HA

2019-03-04 Thread Gary Yao
Hi Sen, Are you using the default MemoryStateBackend [1]? As far as I know, it does not support JobManager failover. If you are already using FsStateBackend or RocksDBStateBackend, please send JM logs. Best, Gary [1] https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/state_bac

Re: submit job failed on Yarn HA

2019-03-04 Thread 孙森
Hi Gary: Yes, I enable the checkpoints in my program . > 在 2019年3月4日,上午3:03,Gary Yao 写道: > > Hi Sen, > > Did you set a restart strategy [1]? If you enabled checkpoints [2], the fixed- > delay strategy will be used by default. > > Best, > Gary > > [1] > https://ci.apache.org/project

Re: submit job failed on Yarn HA

2019-02-28 Thread Gary Yao
Hi Sen, I took a look at the CLI code again, and found out that -m is ignored if high- availability: ZOOKEEPER is configured in your flink-conf.yaml. This does not seem right and should be at least documented [1]. Judging from the client logs that you provided, I think the problem is that the cli

Re: submit job failed on Yarn HA

2019-02-27 Thread 孙森
Hi Gary: I have tried the 1.5.6 version, it shows the same error. org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result. at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)

Re: submit job failed on Yarn HA

2019-02-27 Thread Gary Yao
Hi, How did you determine "jmhost" and "port"? Actually you do not need to specify these manually. If the client is using the same configuration as your cluster, the client will look up the leading JM from ZooKeeper. If you have already tried omitting the "-m" parameter, you can check in the clie

submit job failed on Yarn HA

2019-02-26 Thread 孙森
Hi all: I run flink (1.5.1 with hadoop 2.7) on yarn ,and submit job by “/usr/local/flink/bin/flink run -m jmhost:port my.jar”, but the submission is failed. The HA configuration is : high-availability: zookeeper high-availability.storageDir: hdfs:///flink/ha/ high-availability