Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-05 Thread Gary Yao
Hi Julio, I agree that the job submission should work in HA mode if you manually specify the JobManager. At the minimum a proper error message should be shown. Feel free to open an issue in JIRA. You already stated that you can maintain multiple configuration directories as a workaround. It is po

Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-03 Thread Julio Biason
Hey Gary (again), Yup, that worked. Now I can launch apps again. ... but that's not something actually good. I mean, I have my own test environment, which doesn't need HA -- after all, I don't need to worry about this, this is a framework job, not my pipeline job. Which means now I'll need to ei

Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-03 Thread Julio Biason
Hey Gary, Yes, I was still running with the `-m` flag on my dev machine -- partially configured like prod, but without the HA stuff. I never thought it could be a problem, since even the web interface can redirect from the secondary back to primary. Currently I'm still running 1.4.0 (and I plan t

Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-03 Thread Gary Yao
Hi Julio, Are you using the -m flag of "bin/flink run" by any chance? In HA mode, you cannot manually specify the JobManager address. The client determines the leader through ZooKeeper. If you did not configure the ZooKeeper quorum in the flink-conf.yaml on the machine from which you are submittin

Re: Cannot submit jobs on a HA Standalone JobManager

2018-05-02 Thread Julio Biason
Hey guys and gals, So, after a bit more digging, I found out that once HA is enabled, `jobmanager.rpc.port` is also ignore (along with `jobmanager.rpc.address`, but I was expecting this). Because I set the `high-availability.jobmanager.port` to `50010-50015`, my RPC port also changed (the docs mad

Cannot submit jobs on a HA Standalone JobManager

2018-05-02 Thread Julio Biason
Hello all, I'm building a standalone cluster with HA JobManager. So far, everything seems to work, but when i try to `flink run` my job, it fails with the following error: Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway. So far,